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(57) Abstract 

The gene for the autosomal recessive neurodegenerative disorder Spinal Muscular Atrophy has been mapped to a region of chromosome 
5. The gene encodes a protein having homology with apoptosis inhibitor proteins of viruses so that the encoded protein has been labelled 
as a neuronal apoptosis inhibitor protein (NAIP). A deletion in the (NAIP) domain was identified in persons with Type I, II and III Spinal 
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NEURONAL APOPTOSIS INHIBITOR PROTEIN, GENE SEQUENCE 

MP MUTATIONS CMWMIYB OF SPINAL ffl78CTIA R A TRO P HY 

FTFTiD OF THE INVENTION 
5 The gene for the neuronal apoptosis inhibitor 

protein (NAIP) has been identified in the ql3 region of 
chromosome 5. Mutations in this gene have been diagnosed 
in individuals with Type I, II and III Spinal Muscular 
Atrophy. The amino acid sequence of the neuronal 

10 apoptosis inhibitor protein is provided and homology to 
viral apoptosis proteins demonstrated, 
BACKGROUND OF THE INVENTION 

In order to facilitate reference to various journal 
articles in the discussion of various aspects of this 

15 invention, a complete listing of the reference is 

provided at the end of the disclosure. Otherwise the 
references are identified in the disclosure by first 
author's name and publication year of the reference. 

The childhood spinal muscular atrophies (SMAs) are a 

2 0 group of autosomal recessive, neurodegenerative disorders 

classified into three types based upon the age of onset 
and clinical progression (Dubowitz et al., 1978; Dubowitz 
et al., 1991). All three types are characterized by the 
degeneration of the alpha motor neurons of the spinal 
25 cord manifesting as weakness and wasting of the proximal 
voluntary muscles. Type I SMA is the most severe form 
with onset either in utero or within the first few months 
of life. Affected children are unable to sit unsupported 
and are prone to recurrent chest infections due to 

3 0 respiratory insufficiency, thus rarely surviving the 

first few years of life (Dubowitz et al. , 1978; Dubowitz 
et al., 1991). This acute form, with a carrier frequency 
of 1/60 to 1/80, is one of the most frequent fatal 
autosomal recessive disorders. Affected children with 
3 5 Type II SMA never walk unaided and although the prognosis 
is variable, such children may die in adolescence. Those 
affected with Type III SMA maintain independent 
ambulation but develop weakness any time between the age 
of 3 to 17 years manifesting a_mildly progressive course 
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(Dubowitz et al., 1978; Dubowit2 et al., 1991), 

In 1990, all three childhood forms of SMA were 
genetically mapped to the long arm of chromosome 5 at 
5qll.2 - 13.3 (Brustowitcz et al., 1990; Gilliam el al., 
5 1990; Melki el al., 1990). Subsequent multi-point 

linkage analyses and the identification of recombinant 
events have further localized the genetic defect to the 
region flanked centromerically by D5S435/D5S629 (Soares 
et al., 1993; Wirth et al., 1993, Clermont et al., 1994)) 
10 and telomerically by MAP1B/D5S112 (Wirth et al., 1994; 
MacKenzie et al., 1993; Lien et al., 1991). This 
interval has been refined by the more recent 
identification of recombination events indicating that 
the SMA gene lies distal to CMS-1 (Yaraghi et al., 
15 submitted to Human Genetics; van der Steege, et al., 
submitted to Human Genetics) and proximal to D5S557 
(Francis et al. , 1993). We and others have detected 
chromosome 5-specific repetitive sequences with 
particular abundance in the D5S629/CMS-D5S557 region 
20 (Francis et al., 1993; Thompson et al., 1993) which has 
impeded the isolation and ordering of both clones and 
simple tandem repeats. An array of cosmid clones 
spanning the 200 kb CMS-1 (Kleyn et al., 1993)/CATT-1 
(Burghes et al., 1994, McLean et al., in 
25 press) /D5F150/D5F149/D5F153 (Melki et al., 1994) region 
within this interval has been constructed. 

We established a contiguous array of YAC clones 
encompassing the SMA containing D5S435 - D5S112 interval 
of 5ql3.1. We then discovered a gene within this 
30 interval of 5ql3.1 which coded for a neuronal apoptosis 
inhibitor protein (NAIP) . Further studies demonstrated 
that a deletion in this gene was found in Type I, II and 
III Spinal Muscular Atrophy. 
SUMMARY OF THE INVENTION 
35 a gene encoding a neuronal apoptosis inhibitor 

protein (NAIP) was discovered in the ql3 region of human 
chromosome. According to an aspect of the invention, the 
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cDNA sequence coding of the neuronal apoptosis inhibitor 
protein is provided and set out in Table 4. According to 
another aspect of the invention, the predicted amino acid 
sequence of the neuronal apoptosis inhibitor protein is 
5 provided from the cDNA sequence. 

According to another aspect of the invention, a 
deletion of the neuronal apoptosis inhibitor protein gene 
was discovered in persons with Type I, II and III Spinal 
Muscular Atrophy disease. The discovery of the neuronal 

10 apoptosis inhibitor protein gene deletion provides a 

diagnostic indicator for use in the diagnosis of Spinal 
Muscular Atrophy. 

In order to facilitate a further description of 
various aspects of the invention, reference will be made 

15 to various Figures of the drawings. A brief description 
of the drawings follows this invention summary section. 

According to a further aspect of the invention, a 
human gene is provided which maps to the SMA containing 
region of chromosome 5ql3 . The gene comprises exons 1 

20 through 17 of approximately 5.5 kb and having a 

restriction map for exons 2 through 11, as shown in 
Figure 8 . 

According to a further aspect of the invention, 
exons 1 through 17 have a restriction map for exons 2 

25 through 16, as shown in Figure 9D. 

According to another aspect of the invention, a 
human gene of the above aspects wherein exons 5 through 
16 code for the NAIP protein having an amino acid 
sequence biologically functionally equivalent to the 

30 amino acid sequence of Sequence ID No. 2. 
— According to another aspect of the invention, the 

human gene of the above aspects have exons 5 through 16 
with a cDNA sequence biologically functionally equivalent 
to the cDNA sequence of Sequence ID No. 1. 

3 5 According to another aspect of the invention, a 

purified nucleotide sequence comprises genetic DNA, cDNA, 
mRNA, anti-sense DNA or homologous DNA corresponding to 
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the cDNA sequence of Sequence ID No; 1. 

According to another aspect of the invention, a DNA 
molecule sequence coding for the NAIP protein having 
sequence ID No. 2. 
5 According to another aspect of the invention, a 

purified DNA sequence consists essentially of DNA 
Sequence ID No, 1. 

According to another aspect of the invention, a 
purified DNA sequence consists essentially of a DNA 
10 sequence coding for amino acid Sequence ID No, 2. 

According to another aspect of the invention, a 
purified DNA sequence comprises at least 18 sequential 
base of Sequence ID No. 1. DNA probes, PCR primers, DNA 
hybridization molecules and the like may be provided by 
15 using the purified DNA sequence of at least 18 sequential 
bases. 

According to another aspect of the invention, use of 
the DNA sequences of the above aspects in the 
construction of a cloning vector or an expression vector. 
2 0 According to another aspect of the invention, NAIF 

protein encoded by the above DNA sequences. 

According to another aspect of the invention, NAIP 
protein comprising an amino acid sequence biologically 
equivalent to the amino acid sequence of Sequence ID No. 
25 2. 

According to another aspect of the invention, NAIP 
protein consisting essentially of the amino acid sequence 
of Sequence ID No. 2. 

According to another aspect of the invention, NAIP 
30 protein fragment comprises at least 15 sequential amino 
acids of Sequence ID No. 2. 

According to another aspect of the invention, use of 
the above amino acid sequences in the production of 
hybridomas. 

35 According to another aspect of the invention, a 

method is provided for analyzing a biological sample to 
determine the presence or absence of a gene encoding NAIP 
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protein. 

The method comprises: 

i) providing a biological sample derived from the 

SMA containing region ql3 of chromosome 5; 
5 ii) conducting a biological assay to determine 

presence or absence in the biological sample of at least 
a member selected from the group consisting of: 

a) NAIP DNA Sequence ID No. 1, and 

b) NAIP protein Sequence ID No. 2. 
10 DESCRIPTION OF DRAWINGS 

The original numbering of exons for the NAIP gene 
begin with exon 0 and progressed through exon 16. This 
is identified in drawings as sequence numbering Scheme 
#1. However, for conventional exon numbering, it is 
15 preferable to begin with exon 1 and progress through to 
exon 17. This is now identified as sequence numbering 
Scheme #2. 

Figure 1: YAC contiguous assay of the SMA gene 
region. YACs are represented by solid lines. Open 

20 triangles represent polymorphic STRS, solid triangles 
represent STSS, open squares represent single copy 
probes. The genetically defined SMA interval, CMS-1-SMA- 
D5S557 and the previous D5S629-SMA-D5S557 interval, are 
indicated above the YACS. 

25 Figure 2: Long range restriction map of the SMA 

region. Rare cutter sites are indicated above the solid 
line. A minimal set of markers are indicated below the 
solid line t corresponds to the pYAC4 tryptophan or left 
end. u corresponds to the pYAC4 uracil or right end. The 

30 genetically defined CMS-1-SMA-DSS557 and the D5S629-SMA- 
D5S557 interval are estimated at 550 kb and 1.1 Mb 
respectively. 

Figure 3: Amplification of the CATT- I locus. 
Allele sizes are shown below each lane. (A) 

35 Amplification of YACS. G: genomic DNA. (B) Amplification 
of cosmids derived from the chromosome 5 flow sorted 
library. The 4 distinct alleles are represented by 
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cosmids 40G1 (allele 15), 58G12 (allele 12), 192F7 
(allele 10) and 250B6 (allele 7) . 

Figure 4 : A representative subset of mapped cosmids 
from our contiguous array. Vertical lines above the 
5 solid line are the positions of EcoRI sites. Open 

triangles represent polymorphic STRS, filled triangles 
represent STSS, filled squares represent single copy 
probes and open squares represent transcribed sequences. 
The STRs which demonstrate strong linkage disequilibrium 

10 with Type I SMA are indicated by stars, Cosmids IG3 and 
IB9 are from the YAC 76CI cosmid library. 

Figure 5: Sequence duplication in the SMA region 
identified by pl51.2. Hybridization of YACs with (A) the 
7 00 bp fragment and (C) the 500 bp fragment, YACs are 

15 arranged from left to right, centromeric to telomeric. 

Hybridization of cosmids with (B) the 700 bp fragment and 
(D) the 500 bp fragment. (B) The 12 kb fragment is 
detected in the cosmids however the 2 0 kb fragment is not 
present. The 2 . 5 kb and 6 00 bp fragments detected in 3B3 

20 and IEI respectively are end fragments . (D) Only the 3 kb 
fragment is detected in the cosmids. Note the absence of 
the 20 kb band in 24D6 in (A) but its presence in (C) . 
The 700 bp fragment may be deleted in 24D6. 

Figure 6: Degree of linkage disequilibrium observed 

25 between Type I SMA and various polymorphic 5ql3 . 1 markers 
giving a disequilibrium peak at 40G1. 

Figure 7: A PAC contiguous array containing the 
CATT region comprised of nine clones and extending 
approximately 400 kb. The 2.2 kb transcript referred to 

3 0 as GA1 is shown. 

Figure 8: Structural organization of the SMA gene. 
The exons are represented by black boxes and numbered 
above. The positions of restriction sites are shown: B, 
BamHI; E, EcoRI; N, NotI, Exons 4 and 5 (Scheme #1) or 

3 5 Exons 5 and 6 of Scheme #2 are frequently deleted in all 
types of SMA. 
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EcoRI/BamHI band deleted in Figure 14 is also depicted. 
The 6 kb region containing exons 5 and 6 (Scheme #2) and 
the 2 3 kb BamHI fragment resulting from this deletion are 
both shown in Figures 11C and 11D. The location of 
5 primers utilized to identify deletions of exon 5 and 6 as 
well as those that identify the truncated fragment in the 
deleted NAIP gene are shown above the NAIP structure. 

Figure 12: Intron/exon splice sequences of the NAIP 

gene. 

10 Figure 13: Northern blot of adult tissues probed 

with exon 13 (Scheme /2) of the NAIP locus. Tissues are 
as marked and the filter were washed at 50 °C, 0.2X SSC 
and exposed for 4 days. Bands can seen in liver and 
placenta in the 6-7kb range. 

15 Figure 14: Pedigree and Southern blot analysis of 

consanguineous French-Canadian type III SMA families. 
Upper panel: probing of a filter containing BamHI /EcoRI 
digested genomic DNA with a cDNA probe encompassing exons 
2 through 9 (Scheme #2) of NAIP reveals the loss of the 

20 4.8 kb fragment that contains exons 5 and 6 (Scheme #2) 
in all affected individuals resulting in an in-frame 
deletion. All others, save for the homozygous normal 
sister and brother show half dosage for this band. The 
lower panel shows a BamHI digest of the same family. In 

25 affected individuals two superimposed 14.5 kb contiguous 
fragments have sustained the 6 kb deletion of sequence 
containing a BamHI site resulting in the generation of a 
23 kb band (see Figure 11)* Note the existence of the 23 
kb BamHI band in all individuals in the pedigree in 

30 keeping with its general dispersion in the population. 
Similarly, the 9 . 6 kb BamHI band representing the 
deletion of exons 1 through 6 (Scheme #2) which is 
contained in PAC 238D12 and depicted in Figure 11 can be 
seen in all individuals including non-SMA carriers. 

35 Figure 15: Results of PGR amplification in type 3 

families 21470 and 24561 using primers 1864 and 1863 
which amplify exon 5 (Scheme #2) . The reactions were 
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multiplexed with exon 13 (Scheme #2) primers 1258 and 
1343 to rule out PCR failure obscuring the results. 
Failure of amplification in keeping with the homozygous 
absence of exon 5 (Scheme #2) can be seen to co-segregate 
5 with the disease phenotype. 

Figure 16: RT-PCR amplification of RNA from SMA and 
non-SMA tissues. The letter n refers to RNA from non-SMA 
tissue and a to RNA from SMA affected tissue. The tissue 
source is shown above each panel. Lym refers to 

10 lymphoblast and fib to fibroblast. All samples were from 
type 1 SMA patients with the exception of a5 which is 
from an affected member of the consanguineous type 3 SMA 
family 24561 shown in Figure 15. 

RNA was reverse transcribed from exon 13 (Scheme 

15 #2). Primary PCR of products shown in panels A and B was 
with exon 1 primer 1884 and exon 13 primers 1285 or 1974 
and those in panel C with exon 6 primer 1919 and exon 13 
primer 1285. Secondary PCR reactions for panel A used 
exon 4 primer 1886 and exon 13 primer 1974; for panel B, 

20 exon 5 primer 1864 and exon 11 primer 1979 and for panel 
C, exon 9 primer 1844 and exon 13 primer 1974. 

Failure or amplification of reduced products can be 
seen in panel A for spinal cord and lymphoblast tissue 
for samples a2, a3, a4, a5, a6 and" a7. Panel B also 

25 shows amplification of reduced size bands in a2 and a3, 
and in a7 a larger product in keeping with an insertion. 
Panel C shows reduced band size in keeping with deletions 
of exons 11 and 12 (Scheme #2) in a2, a3 , a9 and all. 
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

30 Unless indicated otherwise, reference to exons in 

this detailed description of the invention will be based 
on exon numbering Scheme #2. 

Throughout the specification, various letter 
abbreviations will be used to identify various components 

35 or techniques. The following glossary is provided to 
reference these items. 
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CTR - complex tandem repeat 

DNA - deoxyribonucleic acid 

PCR - polymerase chain reaction 

PFGE - pulsed field gel electrophoresis 

5 PAC - PI artificial chromosome 

RKA - ribonucleic acid 

RT-PCR - reverse transcriptase-polymerase chain 
reaction 

STR - simple tandem repeat 

10 STS - sequence tag site 

YAC - yeast artificial chromosome 



This invention is directed to the identification, 
location and sequence characteristics of a gene which 
encodes Neuronal Apoptosis inhibitor Protein (NAIP) . We 

15 have established that mutations in this gene are 

causative of the previously discussed types I, II and III 
of Spinal Muscular Atrophies (SMA) . It is believed that 
mutations in this gene result in the lack in the 
production of normal NAIP protein which is believed to be 

20 physiologically involved in the normal human process of 
maintaining neurological cells and preventing their early 
death common to affected individuals. The subject gene 
maps to the SMA containing region of chromosome 5ql3.1. 
Unless indicated otherwise, reference to exons in this 

25 detailed description of the invention will be based on 
exon numbering Scheme #2. The gene comprises exons 1 
through 17 of approximately 5.5 kb and has a restriction 
map for exons 2 through 11, as shown in Figure 8. An 
updated restriction map for exons 2 through 16 is 

30 provided in Figures 9D and 11A. As is appreciated, the 
gene is considerably longer than the sequence for exons 1 
through 17. Considerable intron information exists 
between the exons which has not yet been sequenced. From 
the standpoint of diagnosing SMA, the sequence 

35 information of exons 1 through 17 is very valuable. The 
normal sequence is provided in Table 4 , as well as being 
listed under Sequence ID No. 1. Any genetic mutation, 
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that is, changes in the DNA sequence, whether they be due 
to deletion, entire absence of gene substitution or 
polymorphisms and the like, are or can be causative of 
the disease. The most common mutations are thought to 
5 be: 

i) deletion of exons 5, 6 of the gene; or 
ii) absence or marked reduction in the copy number 
of this gene in the chromosome 5 can be causative, if the 
remaining genes are defective. 

10 Any form of biological assay may be employed to 

diagnose a person's susceptibility to SKA by virtue of 
conducting a biological assay to determine the normal 
sequence or absence or presence of mutations in the 
normal sequence. Such biological assays may include DNA 

15 hybridization by use of DNA probes and the like, 

restriction enzyme analysis, PCR amplification of the 
relevant portions of the sequence, messenger RNA 
detection and DNA sequencing of the relevant portions of 
the sequence, as isolated from chromosome 5 of the human 

20 biological sample. It is appreciated that a variety of 
the above generally identified biological assay 
procedures may be conducted where the preferred 
techniques are as follows: 

SMA diagnoses will be conducted in two ways. 

25 Initially, the genome of the human at risk will be 

assayed for the absence of NAIP exons 5 and 6. These 
exons are found to be absent with a frequency of .05% in 
the general population and 50% in Type 1 SMA. The second 
approach will be to assess the number of copies of the 

3 0 NAIP gene in the individuals being tested. We have 
observed that there is a general depletion of both 
deleted and intact forms of the NAIP gene, in individuals 
with SMA. By using a densitometric approach to assess 
the number of gene copies, an accurate assessment of the 

35 risk having SMA can be established. The best correlation 
is observed for exons 2 through 4 and exon 13. 
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In practical terms, the two steps outlined above 
will be conducted in the following manner: 

(i) two concurrent PGR reactions will be carried 
out upon the same aliquot of DNA (0.1 micrograms) from 
5 the human in question. One primer pair will map into 
exons 5 and 6 (e.g. primers 1863 Sequence ID No. 7 and 
18 64 Sequence ID No. 8) and one pair will be homologous 
to a region outside of exons 5 and 6 (primers 1343 
Sequence ID No. 5 and 12 58 Sequence ID No. 4) . The 

10 latter reaction will be performed to ensure that the PGR 
is functioning. Two additional controls will be (i) PCR 
performed on genomic DNA known to contain exons 5 and 6 
employing the appropriate primers to ensure that this 
particular reaction is working, (ii) negative controls 

15 using water as a template to ensure absence of 

contamination. All PCR products will be placed in an 
agarose gel, separated electrophoretically and analyzed 
visually. 

(ii) Densitometric assessment of SMA risk will be 

20 carried out by using PCR primers tagged with fluorescent 
dyes. PCR reactions employing primers for exons 2 
through 4, exons 13 as well as exons 5, 6 and exons 11, 
12 will be performed on genomic DNA from the individual 
being assessed. PCR products will be separated 

25 electrophoretically on a gel and the intensity of the 

individual bands assessed f luorometrically . These values 
will be correlated with normative values and SMA risk 
thus ascertained. 

It is apparent that one's level of NAIP correlates 

30 with the risk for other neurodegenerative disorders such 
— as amyotrophic lateral sclerosis and Alzheimers. 
Consequently, the tests outlined above serve as 
predictors of risk for these disorders as well. ' As is 
described in more detail in the section under heading 

35 Baculoviral IAPs, the NAIP protein has significant 

homology with proteins for inhibiting cell apoptosis. 
Hence, any neurodegenerative disease which is based on 
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Sodium acetate is added and ethanol. This is 
precipitated overnight over -20 °C. The pellet is 
collected after microcentrif ligation. The pellet is 
washed with ethanol. Then water, Tis-HCl, and KC1 are 
5 added and the mixture is heated to 90 °C and then cooled 
slowly to 67 °C. Microcentrifuge and incubate 3 hours at 
52 °C. This final annealing temperature may be adjusted 
according to base composition of primer. Alternatively, 
the primer can be annealed to the RNA by mixing 

10 poly(A) + RNA, cDNA primer, and water. This mixture is 
heated 3 to 15 minutes at 65 °G* To the cooled mixture, 
add reverse transcriptase buffer. 

The cDNA is now synthesized. Add reverse 
transcriptase buffer and AMV reverse transcriptase. This 

15 is mixed and incubated 1 hour at 42 °C (depending on the 
base composition of primer and RNA) . Add Tris-Cl/EDTA, 
mix then buffered phenol and vortex. Microcentrifuge and 
add chloroform to the aqueous phase and vortex. 
Microcentrifuge. Add sodium acetate and ethanol to 

20 aqueous phase. Mix and precipitate overnight at -20 a C. 
Microcentrifuge, dry pellet, and resuspend in water. 

The cDNA is then amplified by PCR. The mixture 
contains prepared cDNA, amplification, dNTP mix, 
amplification buffer, and water. Usually one of the 

25 amplification primers is the same as cDNA primer. If a 
different amplification primer is used, the cDNA primer 
should be removed from the cDNA reaction. The reaction 
mixture is then heated 2 minutes at 94 °C, and 
microcentrif uged to collect condensate. Add Taq DNA 

3 0 polymerase, mix, centrifuge, overlay with mineral oil. 
Set up amplification cycles. The number of cycles is 
varied depending upon the abundance of RNA. Forty cycles 
are usually sufficient. The products are then analyzed by 
gel electrophoresis in agarose or nondenaturing 

35 polyacrylamide gels. The cDNA can also be introduced 
directly into the amplification step. 
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In referencing the gene # its cDNA ' sequences , other 
DNA sequences and RNA sequences, it is understood that 
any specifically referenced sequence includes any and all 
biologically functional equivalence thereof. Similarly, 
5 with listed protein sequences, it is understood that such 
terminology includes any and all biologically functional 
equivalence thereof insofar as the intended purpose is 
concerned. In the above identified biological assays it 
is understood that the full length or partial length 

10 sequences of the DNA or protein may be used. Generally 
it is contemplated that at least 18 sequential bases of 
the DNA sequence are useful as hybridization probes, PCR 
primers and the like. Similarly, with protein sequences, 
at least 15 sequential amino acid sequences may be 

15 correspondingly useful in developing protein receptors 
such as monoclonal antibodies. Such monoclonal 
antibodies may be made in accordance with the standard 
techniques by developing hybridomas for producing 
monoclonals specific to certain antigenic determinants of 

20 the protein structure. 

With reference to Table 4, it would appear that in 
view of the significant homology of exons 5, 6, 7, 8, 9, 
10 11 and 12 with the IAP domains, such homology may well 
mean that any deletions or other forms of mutations in 

2 5 these exons may result in the carrier being susceptible 

to the disease. For example, this is evidenced by the 
deletion of exons 5 and 6 in low copy numbers in humans 
being causative of the disease. Hence, any of the 
sequence information in this region of the gene will be 
30 important from a diagnosis standpoint so that any 

sequential 18 bases of DNA or 15 sequential amino acid 
residues in this region may be relied on in the diagnosis 
of SMA in suspected humans. It is of course also 
understood that other forms of deletions, mutations, 

3 5 polymorphisms and the like in other regions of the gene 

may be causative of the disease or may be used for other 
purposes in conjunction with disease analysis, prognosis 
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and perhaps treatment. 

Although the restriction maps are useful in 
identifying the characterizing features of the subject 
gene the specific cDNA sequence of exons 1 through 17 has 
5 been provided in sequence ID No. 1. The encoding portion 
of the sequence commences at the ATG codon of base 3 96 of 
exon 5. The encoding portion ends at the stop codon TAA 
of exon 16 at base position 4092. Exons 1 through 4 are 
at the 5' untranslated region and exon 17 is at the 

10 3 'unstranslated region. As with some genetic related 
diseases, mutations or polymorphism in the untranslated 
regions may as well be causative of the disease so that 
sequence portions in the form of probes and the like in 
regions other than the region of significant IAP homology 

15 may be valuable in the diagnosis of SKA* It is also 

understood that the sequence information of sequence ID 
No. 1 may be used in the construction of suitable cloning 
vectors f or- purposes of producing multiple copies of the 
gene or expression vectors for purposes of transfecting a 

2 0 host to produce significant quantities by recombinant 

techniques of the NAIP protein. Sections or fragments or 
full-length sequence information may be used in the 
construction of the cloning vectors or expression vectors 
depending upon the end use of such vectors. With this 
25 understanding, the details in respect of the 
identification of the SMA disease gene its 
characteristics, the corresponding protein sequence and 
their uses in diagnosis are explained. 

A YAC contig of the Spinal Muscular Atrophy (SMA) 

3 0 disease gene region along chromosome 5ql3 was produced 

which incorporated the D5S4 3 5-D5S112 interval and 
encompassed 4 Megabases. The CATT-40G1 subloci on the 
cosmid array showed significant linkage disequilibrium 
with Spinal Muscular Atrophy indicating close proximity 
35 to the gene. However, delineation of the precise region 
containing the SMA gene was not possible based on this 
information alone. A PAC contiguous array containing the 
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Construction of YAC Contig 

YAC clones were isolated from three libraries, 
constructed at the National Centers of Excellence (NCE , 
Toronto), the Imperial Cancer Research Fund (ICRF, 
5 London) (Larin et al., 1991) and the Centre d'Etude du 
Polymorphisme Humaine (CEPH, Paris) (Albertson et al., 
1990), all of which were prepared from partial EcoRI 
digests of total DNA ligated into the YAC vector pYAC4. 
ICRF YAC clones were identified by probing library 

10 filters with 5ql3.1 probes. YAC DNA from the NCE library 
was screened by PCR amplification, electrophoresed, 
immobilized onto Southern blots and hybridized with the 
radiolabeled STS product to identify positives. 
Numerous positives were obtained repeatedly in both the 

15 initial round of PCR of pooled plates, and the second 
round with the plate (s) thought to contain the clone of 
interest many of which proved to be false positives- The 
number of false positives obtained, which appeared to be 
primer dependent, was reduced by radiolabelling PCR 

2 0 products and resolving these on 6% polyacylamide gels. 

The true positives could then be sized accurately without 
interference from spurious products. 

Yeast strains with YACs positive for 5ql3.1 STSs 
were grown on selective plates and examined for stability 

2 5 in the following manner: 4 colonies of each were grown 

for preparation in agarose blocks, yeast chromosomal DNA 
was separated by pulsed field gel electrophoresis and 
transferred to filters and the size and number of YAC 
clones contained within each yeast colony was determined 

3 0 by hybridization with radiolabeled total human genomic 

DNA. Positive clones were confirmed either by 
hybridization or PCR amplification with the original 
probe. Only YAC 24D6-2 contained some colonies with more 
than one YAC. 

35 YAC end clones and inter-Aiu products were isolated 

by vector-Alu PCR and inter-Alu PCR respectively. The 
location of these products within 5qll-13 was confirmed 
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analysis. Internal YAC products generated by AiuPCR were 
utilized to probe all YACs establishing the degree of 
overlap. STS sequences (Kleyn et al. f 1993) mapping 
between JK348 and D5S112 were utilized to confirm the 
5 degree of overlap and the orientation of YACs in the 
contig- Concurrently the order of each STS along 5ql3 
was confirmed. In all a total of 14 YACs were 
identified, anchored by the genetic markers D5S435, 
D5S629, CMS-1, CATT-1, D5F153, D5F149, D5F150, D5F151, 
10 D5S557 and D5S112. 

Long Range Restriction Map and 

Estimation ol Long range Physical Distance 

A restriction map of the critical SMA region was 

15 constructed from the STS Y116U (Kleyn et al., 1993), 

approximately 100 kb proximal to D5S629, to the STS Y107U . 
(Kleyn et al., 1993), which lies approximately 500 kb 
distal to D5S557 (see Figure 2). In order to detect any 
possibility of deletions or rearrangements in our YACS, 

20 additional YACs isolated from the CEPH library (Kleyn et 
al., 1993), mapping within this region were included in 
the analysis. YACs 24D62, 27H5, 33H10, 155H11, 76C1, 
235B7, 184H2, 428C5, and 81B11 (Kleyn et al., 1993) were 
partially digested utilizing the rare cutter restriction 

2 5 endonucleases NotI, BssHII, Sfil, and RsrI . Southern 

blots of the Pulse Field Gel Electrophoresis (PFGE) 
separated restriction products were hybridized with YAC 
left arm and right arm specific probes which revealed the 
positions of cleavage sites from both ends of each YAC. 

3 0 The orientation and overlap of the YACs had been 

previously determined based on STS analysis, therefore 
the position of the rare cutter sites among the 
overlapping YACs were compared. By aligning the 
overlapping YACs at their common rare cutter sites, the 
3 5 degree of overlap could be more precisely determined. 
The long range restriction map of the overlapping YACs 
derived from different sources was mostly in agreement 
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with the exception of 33H10 and 428C5. 428C5 has 
previously been documented to contain a deletion (Kleyn 
et al., 1993), evident by comparison of its STS content 
and its size of only 300 kb, indicating that it lies 
5 further centromeric than its placement in Figure 2. YAC 
3 3H10, based on STS analysis contains an internal 
deletion and YAC 155H11 is chimeric at its telomeric end 
therefore rare cutter sites at the telomeric end of the 
map which could not be confirmed were not included. The 

10 results indicate the distance from the centromeric 

boundary D5S435 to the telomeric boundary D5S557 to be 
1,4Mb in marked contrast to 400 kb as previously reported 
(Francis et al. , 1993) but in agreement with one other 
estimate (Wirth et al., 1993). Furthermore, the D5S629- 

15 D5S557 interval can be estimated at 1.1 Mb and the 
distance of the genetically defined CMS1-SMA-D5S557 
interval is approximately 550kb. 

Cosmid Contig Assembly from the Chromosome 5 Library 

20 Although the isolation of cosmids utilizing whole 

YACs as probes could be an expeditious method of 
constructing a cosmid contig, in this case the presence 
of chromosome 5 specific repeats would likely result in 
the isolation of cosmids mapping elsewhere on chromosome 

25 5 . A directed cosmid walking strategy was thus adopted. 
The CATT-1 STR, which has been shown by irradiation 
hybrid analysis to map approximately midway between the 
two flanking markers D5S435 and D5S351 (Hudson et al., 
1992) , was utilized as the initiation point for the 

30 construction of a cosmid clone array. The complex 

pattern of amplification seen on genomic DNA, with two to 
eight alleles per individual (see Figure 3), suggested a 
variable number of copies or loci of the CATT-1 sequence 
in this region. Thirty CATT-1 positive cosmids were 

35 identified which upon PCR analysis were seen to contain 
one of four distinct alleles (see Figure 3) . As the 
cosmid library was derived from a monochromosomal source, 
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this confirmed that the CATT STR exists at least in four 
locations, which we refer to as subloci.. These subloci 
are referred to as CATT-4 0G1, CATT-192F7, CATT-58G12 and 
CATT2 50B6-based on the cosmid addresses of the first 
5 cosmids identified containing alleles of 12, 19, 15 and 
20 cytosine adenosine (CA) dinucleotides respectively. 
Bi-directional walking was initiated from these 4 cosmid 
subloci. Positive hybridization was observed for cosmid 
250B6 with one end of 58G12 and for 192F7 with the other 

10 end resulting in the ordering of cen-192F7-58G12-250B6- 
tel (Figure 4), All cosmids which contained the CATT- 
192F7 allele were mapped to this location based on the 
size of their CATT-1 allele and their restriction enzyme 
profiles. As shown in Figure 4 the CATT-192F7 sublocus 

15 is telomeric to the STR CMS-1, which itself lies 
telomeric to the CATT-4 0G1 sublocus. 

Due to the presence of chromosome 5 specific 
repetitive sequences, resulting in the identification of 
cosmids from another region of chromosome 5, the 

2 0 integrity of the contig was verified with each step 

taken. Cosmid end clones generated by vector-Alu-PCR 
were hybridized to somatic cell hybrid panels as 
described above- As repetitive sequences which map 
solely to the region of chromosome 5 that is deleted in 

25 the hybrid cell line HHW1064 have been observed, cosmids 
identified by end products which did not hybridize to 
HHW1064 were analyzed further. Proof of overlap was 
shown by hybridization of end clones, single copy probe 
hybridization, STS content, and restriction enzyme 

30 profile comparison. Cosmids identified by end clones 

— which hybridized to HHW1064 were eliminated and walking 
was continued by utilizing a different inter-Alu product 
from the clone of origin, which was verified in the same 
manner. Cosmid sizes were calculated by the addition of 

35 EcoRI restriction fragments and the extent of overlap was 
determined by the addition of those fragments in common. 
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Cosmid Contig Assembly of YAC 76C1 Cosmlds 

As extension of the, cosmid contiguous array was 
prevented by the presence of chromosome 5 specific 
repeats, a 5X cosmid library was produced from YAC 7 6C1. 
5 The STSs CATT- 1, CMS-1, Y122T (Kleyn et al., 1993), Y97T 
(Kleyn et al., 1993) and Y98T (Kleyn et al., 1993), which 
are distributed along the YAC were utilized to identify 
cosmids to assemble the contig. As well, the previously 
developed markers, pZY8, pL7, pGA-1, plS.l, p402.1, 

10 p2281.8 and ^-glucuronidase (Oshima et al., 1987) (Table 
2, Figure 4) from the established cosmid contig were 
hybridized to -the library providing an effective method 
of ordering the cosmids. Cosmids demonstrating irregular 
hybridization patterns and thought to contain deletions 

15 and/or rearrangements were excluded. 

The STS Y98T identified three cosmids including one 
previously identified by the probe p2281.8, derived from 
a chromosome 5 library clone, 228C8, also containing the 
STS Y98T. An end product of this cosmid hybridized to 

2 0 ten cosmids. Concurrently, an end fragment of a CATT4 0G1 
sublocus was shown to hybridize to four of these ten 
cosmids thus linking CATT-4 0G1 and CMS-1 with the more 
centromeric STS Y98T (Figure 4) . We were unable to 
identify any clones containing the YAC end STS Y97T. 

2 5 Filter hybridization and STS mapping experiments 

indicated a second more telomeric location of the 
CATT4 0G1 sublocus. A duplication of this sublocus would 
agree with genotype data in our SMA kindreds (McLean et 
al. , in press) . 

3 0 An EcoRI restriction map was generated utilizing a 

minimal set of cosmids necessary to span the region. To 
ensure the reliability of the contig, we sought to 
integrate it with the contig constructed from the 
chromosome 5 specific library. Concordance of the 
3 5 contigs was evident by comparison of the restriction 

maps, the position of probes and STSs on the map and Aiu- 
PCR fingerprinting. In this manner the size of the 
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' contig was estimated to be 210 kb. • A directed walking 
strategy has thus resulted in the generation of a single 
contiguous set of cosmids containing the CATT-1 cluster 
of subloci with known centromere /telomere orientation. 

5 

Dupl i ca tions /Del e t i ons 

Several lines of evidence suggested the presence of 
genomic sequence duplications within our cosmid array. 
We provide evidence for the duplication of the CATT-4 0G1 

10 sublocus in cosmids derived from a single chromosome 5. 
A centromeric location for this sublocus established as 
the CATT-4 0G1 sublocus was found to be contiguous with 
the STSs Y122T, Y88T and CMS-1 in several cosmids, and 
the centromeric YAC 428C5 is positive for probes isolated 

15 from the CATT-40G1 containing cosmids. Although YAC 
4 28C5 does not contain the CATT4 0G1 sublocus upon PCR 
amplification, this may be explained either by a null 
allele in the chromosome from "which the YAC was derived 
or a deletion in the YAC. We have previously observed 

2 0 null alleles in individuals at distinct CATT-1 subloci, 

A second more telomeric location of CATT-40G1 was 
determined by the hybridization to CATT40G1 cosmids of 
the probes pGA- 1, pL7 , and pZY8 all of which bind the 
more telomeric YACs 33H10, 24D62. The hybridization of 
25 p402.1, derived from cosmid 40G1, to cosmids at both 
locations would indicate that the duplication is not 
restricted to the CATT-4 0G1 subloci and likely 
encompasses a larger region. Southern blot analysis 
revealed distinct profiles of cosmids for the two 

3 0 locations however, common bands were detected by Aiu-PCR 

fingerprinting supporting a duplication. 

Correlation of our YAC contig with the cosmid contig 
revealed that YACs 76C1, 81B11, and 27H5 span the 150 kb 
CATT region of 5ql3. Despite this, CATT-1 genotyping of 
35 these YACs revealed only one allele size, raising the 
possibility that the chromosomes from which these YACs 
were derived (4 in all) contain null alleles at their 
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remaining CATT-1 subloci. Our experience, however, with 
CATT linkage analysis of SMA families indicated that such 
a scenario is highly unlikely as none of the 
approximately 3 00 individuals genotyped had fewer than 2 
5 alleles. We consequently believe it is more likely that 
these CATT subloci are unstable and have been deleted 
during YAC construction and/or propagation. 

Sequence comparison between the CATT-1 and D5F153 
primer sequences indicated that these two STRs were 

10 similar and possibly the same as one primer is identical 
and the other primer sequences overlap by eight 
nucleotides. However, the centromeric YACs 4 28C5, 
232F12, 235B7, 184H2, and the telomeric YACs 12H1, 
155H11, 2 69A6 which were CATT-1 negative yielded D5F153 

15 amplification products indicating that CATT-1 may be a 
derivative of D5F153. These data, in combination with 
D5F153 analyses of the cosmid contig, which contains 
three D5F153 loci (Figure 4) ,- indicated that at least 
five D5F153 subloci exist. 

20 In addition to the CATT-1 and D5F153 STRs, the STRs 

CMS-l and D5F150 were present in a variable number of 
copies per chromosome 5. STS analysis localized CMS-l to 
YACs 428C5, 76C1, 81B11 and 27H5 with allele sizes of 5, 
4, 4 and 3, and 4 respectively. PCR amplification of 

25 genomic DNA revealed up to four alleles per individual 
indicating as many as two copies per chromosome. D5F150 
was present at two locations within the cosmid array yet 
only one location was detected in the YAC contig. D5F151 
was not detected within our cosmid array nevertheless it 

3 0 was placed at the centromeric end of YAC 33H10, which 
encompasses the cosmid array, based on the positive 
amplification of YAC 428C5. One location of D5F149 was 
detected on both our cosmid and YAC clones. Our data 
suggested, as with CATT-1, the existence of null alleles 

35 and/or instability of the CMS-l, D5F150, D5F151, D5F149 
sequences in YACS. 
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A deletiori event was observed in hybridization with 
an 8 00 bp EcoRI fragment isolated as a single copy probe 
from the CATT-4 0G1 containing cosmid 2 3 4A1 from the 
chromosome 5 specific cosmid library. Probings of YAC 
5 DNA failed to detect this fragment in any of our YACs. 
Hybridization to genomic DNA of several individuals did 
not identify any deletion events thus this sequence may 
be susceptible to instability in the YACS. Sequencing of 
this fragment did not reveal any exons or coding region. 

10 Further evidence of sequence duplication in the SMA 

region was identified with a 1.2 kb internal AIu-PCR 
product (pl51.2) from cosmid 15F8 (Figure 4). The probe 
identified three EcoRI fragments in YAC clones 7 6C1, 
81B11 and 27H5 (20 kb, 12 kb and 3 kb) but only one in 

15 33H10 and 24D6 (20 kb) and one in 428C5 (12 kb) . An 

internal EcoRI site divided this marker into 500 bp and 
700 bp probes. The larger probe identified the 12 kb and 
20 kb fragments while the smaller probe identified the 3 
kb and 2 0 kb fragments (Figure 5) . We ruled out 

2 0 instability of this sequence in YACs as they are from 
different libraries and the hybridization patterns 
reflected their physical location. The 12 kb and 3 kb 
fragments were localized on the EcoRI restriction map, 
however we were unable to position the 2 0 kb fragment. 

2 5 Taken together these findings suggest the 12 kb and 3 kb 

lie in tandem with a centromeric/telomeric orientation 
respectively. A location of the 20 kb fragment distal to 
our contiguous array of cosmids may be inferred from the 
data. The duplication was confirmed by hybridization to 
30 genomic DNA digests revealing all three fragment sizes. 

YAC Contig and Cosmid Contig Che&acteristics 

We established a YAC contig of the SMA disease gene 
region, incorporating the D5S435-D5S112 interval and 

3 5 encompassing 4 Mb. Orientation of the contig along 5ql3 

was confirmed by analysis of seven genetic markers and 
STSs in combination with PFGE analysis. The long range 
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restriction map revealed neither major' deletions nor 
rearrangements among the YACs within our contig, and was 
utilized to refine the estimates of the size of the 
contig. Our YAC map establishes physical linkage of the 
5 markers D5S629, D5F153, D5F151, D5F150, D5F149, CMS-1, 
CATT-1 and D5S557 to a 1,1 Mb region, a region of the 
genome characterized by low copy repetitive sequences and 
multilocus STRS. Furthermore, we estimated the new 
genetically defined CMS1-SMA-D5S557 to be 550 kb. 

10 Estimates of the physical distance of the D5S435-D5S557 
interval ranging from 400 kb (Francis et al., 1993) to 
1.4 Mb (Wirth et al., 1993) have been reported. In 
contrast to these studies our estimation of 1.4 Mb for 
the D5S435-SMA-D5S557 interval and 550kb for the CMS11- 

15 SMA-D5S557 interval, employs clones derived from three 
sources, comprised of 6 chromosomes. Moreover, the 
determination of both the size of clones and the position 
of rare cutter, sites has enabled us to determine more 
precisely the extent of overlap of the YACs and the size 

20 of the contig providing a reliable estimation. 

We also assembled a single contiguous array of 
cosmid clones derived from both a chromosome 5 specific 
library and a YAC (76C1) specific library in conjunction 
with a restriction map of the CMS-1/CATT- 

25 1/D5F153/D5F150/D5F149 region encompassing 210 kb. The 
repetitive sequences prevented extension of the cosmid 
contig when utilizing a chromosome 5 specific library 
necessitating construction of a cosmid library YAC 76C1 
in the critical region. The contiguous cosmid array was 

3 0 constructed by a directed walking strategy with 

validation of, cosmid overlap established by restriction 
fragment enzyme ..over lap, Alu fingerprinting, and analyses 
involving STSs, cosmid end clones and single copy probes. 
Physical and genetic mapping analyses revealed a 

35 complex region of genomic DNA comprising duplications and 
the presence of repetitive sequences. Genotyping of 
genomic DNA with complex STRs from this region revealed 
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the presence of a polymorphic number of bands ranging as 
high as eight per individual. This suggested the 
presence of multiple copies, or subloci, for the STRs 
CATT-1, CMS-1, D5F153, D5F150. Our physical mapping data 
5 confirmed the presence of these subloci except in the 
case of D5F151 and D5F149 which revealed only one 
location. Four of the CATT-1 subloci map to our cosmid 
array within a 140 kb region; at least one of these 
subloci, CATT-40G1 f is duplicated. D5F153 and CATT-1 are 
10 related STRs which appear to have diverged from a common 
ancestor. We had localized one CMS-1 sublocus to our 
cosmid array, however, we were unable to determine from 
our data whether other subloci exist on other chromosomes 
within this 2 00 kb interval, as the chromosomes from 
15 which the YAC/cosmid libraries were derived may either 
contain null alleles at the remaining subloci or have 
sustained deletions . 

The CATT-1, D5F153, D5F150 and D5F149 STR, although 
present in multiple copies on chromosomes in the 
20 population were observed as single sublocus markers on 

all YACS, as evidenced by single allele PCR products for 
each, suggesting instability and deletion of these 
sequences. This is supported by the absence in our YACs 
of an 800 bp fragment, derived from the chromosome 5 
25 cosmid library based contiguous array. Instability of 
these sequences does not appear to result in large 
deletions as additional unique sequence probes located 
between the multiple subloci are retained in the YACs. 
In summary, we have produced the first high 
3 0 resolution physical map of the critical SMA region. 
However, delineation of the precise region which 
contained the SMA gene was not possible based on this 
information alone. 

Concurrent with our genetic analysis, we constructed 
35 a YAC contiguous array employing clones from three 

different YAC libraries (Roy et al., 1994). A minimal 
representation from this array, which was correlated with 
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extensive pulsed field gel electrophoresis (PFGE) 
analysis, is shown in Figure 9B. 

With the initial suggestion of linkage 
disequilibrium of the general CATT marker and SMA 
5 {Burghes et al., 1994) , the construction of a cosmid 
contiguous array incorporating the extended CATT region 
was undertaken. The presence of extensive and 
polymorphic genomic repetitive elements mapping both to 
5ql3 and elsewhere on chromosome 5 interfered with a 

10 straightforward assembly of a contiguous array. However, 
the integrity of the array was established by restriction 
enzyme analyses, Alu-PCR fingerprinting, STS content 
determination and nucleic acid hybridization using cosmid 
end clones and other single copy probes. This resulted 

15 in the generation of an array encompassing 220 kb that 
contained the five CATT subloci contained in a mono- 
chromosomally derived flow sorted chromosome 5 genomic 
library (Roy et; al., 1994). More recently, a Pi 
artificial chromosome (PAC, loannou et al., 1994) 

2 0 contiguous array containing the CATT region, comprised of 

10 clones and extending approximately 550 kb, was 
constructed (Figure 9C) . 

Linkage Disequilibrium Analysis 
25 A linkage disequilibrium analysis employing 5 

complex and simple tandem repeats mapping to the SMA 
region was conducted. Two of the polymorphisms employed 
in this analysis were the CATT-4 0G1 and CATT-192F7 
subloci which we mapped to our cosmid array. Specific 

3 0 amplification of the two individual subloci was achieved 

by constructing primers ending on sequence polymorphisms 
in the region flanking the CA repeat. A clear linkage 
disequilibrium peak was observed at the CATT-4 0G1 
sublocus as shown in Figure 6. 
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PAC Contig Array 

Since the 4 0G1 CATT subloci demonstrated linkage 
disequilibrium, a PAC contiguous array containing the 
CATT region was constructed. This PAC contig array 
5 comprised 9 clones and extended approximately 4 00 kb 
(Figure 7). Our genetic analysis combined with the 
physical mapping data indicated that the 4 0G1 CATT 
subloci marker which showed the greatest disequilibrium 
with SMA was duplicated and was localized at the extreme 

10 centromeric of the critical SMA interval. Consequently 
the 154 kb PAC clone 125D9 which contained within 10 kb 
of its centromeric end the SMA interval defining CMS 
allele 9 and extended telomerically to incorporate the 
4 0G1 CATT sublocus was chosen for further examination, 

15 Two genomic libraries were constructed by performing 

complete and partial (average insert size 5 kb) Sau3Al on . 
PAC 125D9 and cloning the restricted products into BamHl 
digested Bluescript plasmids. Genomic sequencing was 
conducted on both termini of 200 clones from the 5 kb 

20 insert partial Sau3Al library in the manner of (Chen et 
al., 1993) permitting the construction of contiguous and 
overlapping genomic clones covering most of the PAC, 
This proved instrumental in the elucidation of the 
neuronal apoptosis inhibitor protein gene structure, 

25 PAC 125D9 is cleaved into 30 kb centromeric and 125 

kb telomeric fragments by a NotI site (which was later 
shown to bisect exon 7 of the PAC 125D9 at the beginning 
of the apoptosis inhibitor domain. The NotI PAC 
fragments were isolated by preparative PFGE and used 

30 separately to probe fetal brain cDNA libraries • Physical 
mapping and sequencing of the NotI site region was also 
undertaken to assay for the presence of a CpG island, an . 
approach which rapidly detected coding sequences. The 
PAC 125D9 was also used as a template in an exon trapping 

3 5 system resulting in the identification of the exons 
contained in the neuronal apoptosis inhibitor protein 
gene. 
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The multipronged approach, in addition to the 
presence of transcripts identified previously by 
hybridization by clones from the cosmid array (such as, 
GA1 and L7) , resulted in the rapid identification of six 
5 cDNA clones contained in neuronal apoptosis inhibitor 
protein gene. The clones were arranged, where possible, 
into overlapping arrays. Chimerism was excluded on a 
number of occasions by detection of co-linearity of the 
cDNA clone termini with sequences from clones derived 
10 from the PAC 125D9 partial Sau3Al genomic library. 

Cloning of Neuronal Apoptosis Inhibitor Protein Gene 
In the meantime, a human fetal spinal cord cDNA 
library was probed with the entire genomic DNA insert of 
15 cosmid 250B6 containing one of the 5 CATT subloci. This 
resulted in a detection of a 2.2 kb transcript referred 
to as GA1 which location is shown in Figure 7, Further 
probings of fetal brain libraries with the contiguous 
cosmid inserts (cosmids 40G1) as well as single copy 

2 0 subclones isolated from such cosmids were undertaken. A 

number of transcripts were obtained including one termed 
L7. No coding region was detected for L7 probably due to 
the fact that a substantial portion of the clone 
contained unprocessed heteronuclear RNA. However, we 
25 later discovered that L7 proved to comprise part of what 
is believed to be the neuronal apoptosis inhibitor 
protein gene. Similarly, the GA1 transcript ultimately 
proved to be exon 13 of the neuronal apoptosis inhibitor 
protein. Since GA1 was found to contain exons indicating 

3 0 that it was an expressed gene, it was of particular 

interest. The GA1 transcript which was contained within 
the PAC clone 125D9 was subsequently extended by further 
probing in cDNA libraries. 

The extended GA1 transcript was compared to other 
3 5 known sequences to reveal that its amino acid sequence 
had significant homology to the inhibitor apoptosis 
polypeptides of Orgyia Pseudotsugata and Cydia Pomonella 
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viruses (Table 3) . This sequence analysis revealed the 
presence of inhibitor apoptosis protein homology in exons 
5 and 6. 

The remaining gaps in the cDNA were completed and 
5 the final 3' extension was achieved by probing a fetal 
brain library with two trapped exons, A physical map of 
the cDNA with overlapping clones was prepared. The 
entire cDNA sequence is shown in Table 4 and contains 
sixteen exons. The amino acid sequence starts with 

10 methionine which corresponds to the nucleotide triplet 
ATG. Figure 8 demonstrates the structural organization 
of the SMA gene. 

The cDNA sequence of NAIP shown in Table 4 allows 
one skilled in the art to develop from this gene, 

15 primers, probes and also antibodies against the protein 
product. The cDNA sequence of Table 4 may be used in 
recombinant DNA technology to express the sequence in an 
appropriate host in order to produce the neuronal 
apoptosis inhibitor protein. In this manner, a source of 

2 0 neuronal apoptosis inhibitor protein is provided. Given 

the sequence of NAIP and the probes and primers therein, 
deletions in the sequence may also be detected, for 
instance, in the disorder Spinal Muscular Atrophy. 

25 NAIP Structure 

The NAIP gene contains 17 exons comprising at least 
5.5 kb and spans an estimated 80 kb of genomic DNA. The 
NAIP coding region spans 3698 nucleotides resulting in a 
predicted gene product of 123 3 amino acids. NAIP 

3 0 contains two potential transmembrane regions and an 

intracellular inhibitor of apoptosis domain immediately 
contiguous with a GTP binding site. Searches of the 
N protein domain programs generated the following results: 
(i) residues 9-91: an N terminal domain with no 
35 recognizable motifs. 

(ii) residues 94-118: hydrophobic potential 
membrane spanning domain. 
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(iii) residues 169-485: a domain which shows 
homology. with apoptosis inhibitors and is immediately 
before the next hydrophobic domain, GTP/ATP binding site, 
(iv) residues 486-504: a hydrophobic potential 
5 membrane spanning domain. 

(v) residues 505-1005: possible receptor domain 

containing 4 N-linXed glycosylation sites and a 
lipoprotein binding domain 

10 Neuronal Apoptosis Inhibitor 

Protein Gene Mutational Analysis 

A cDNA20.3 probe was found by using the entire PAC 
125D9 as a probe to screen cDNA libraries. Probing of 
genomic southerns with eDNA probe 20.3 revealed the 

15 absence of a 9 kb EcoRI band in a Type III consanguineous 
family. This information mapped the NAIP gene deletions 
to exons 5 and 6. Thus the deletion covers the exon 
containing the rare NotI restriction site and the exon 
immediately downstream. Primers in and around these 

2 0 exons were constructed revealing the absence of 
amplification from 3 Type I and 3 Type III SMA 
individuals. Genomic DNA was isolated from PAC and 
cosmid subclones in and around exons 4 and 5 and 
sequenced in an effort to generate primers which would 

2 5 amplify the junction fragment generated by the causative 

deletions as depicted. A junction fragment was detected 
in the Type III individual. A similar product was 
observed in two other French Canadians with no history of 
consanguinity. The 3 Type I and 3 Type III SMA 

3 0 individual's chromosomes had identical CATT/CMS 

haplotypes strongly suggesting that this is a common mild 
SMA mutation and comparatively frequent in the French 
Canadian population. Cosegregation of this pattern was 
demonstrated. We have conducted analysis of 110 parents 
35 of SMA individuals and have failed to find a similar 
product. Sequencing of the genomic DNA in this region 
revealed an approximately 10 kb deletion resulting in an 
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in frame deletion. This deletion spans intron regions 
and exons 5 and 6. Southern blot analysis of two 
generation SMA families was performed. A cDNA probe 
encompassing the first eight exons was performed on 
5 £coi?I-digested DNA from peripheral blood leukocytes, SMA 
affected members show an absence of hybridization to a 10 
kb EcoRI band which was shown to contain exons 5 and 6 
(Figure 9) . 

Initial isolation of the NAIP transcript was 
10 achieved by probing a human fetal brain cDNA library with 
the entire 28 kb genomic DNA insert of cosmid 250B6 that 
contains one of five CATT subloci present in the cosmid 
library. This resulted in the detection of a 2.2 kb 
transcript that ultimately proved to be exon 14 of the 
15 NAIP gene. Further probing of fetal brain libraries with 
the contiguous cosmid inserts (cosmid 4 0G1) , as well as 
single copy subclones isolated from such cosmids 
identified, a number of transcripts including the L7 
transcript that ultimately proved to contain exon 13 of 

2 0 the NAIP locus. No coding region was detected for L7, 

probably due to the fact that a substantial proportion of 
the clone contained unprocessed heteronuclear RNA, 
obscuring its true nature. 

At this stage, the completed genetic and linkage 
25 disequilibrium analyses and construction of the PAC 

contiguous array identified PAC 125D9 as having a good 
probability of containing the SMA locus. Four PAC 125D9 
genomic libraries were constructed by performing 
complete and partial (average insert size 5 kb) Sau3AI, 

3 0 BamHI and BamHI/NotI digests on the PAC insert and 

cloning the restricted products into, plasmid vector. 
High through put genomic sequencing was conducted on both 
termini of 2 00 clones from the 5 kb insert partial Sau3AI 
digestion library in the manner of (Chen et al., 1993), 
3 5 permitting the construction of contiguous and overlapping 
genomic clones covering most of PAC 125D9 (data not 
shown) . This has proven instrumental in elucidating the 
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In addition to the NH2 terminal IAP domain, there 
exists cysteine and histidine rich zinc finger-like 
motifs in the carboxy terminus of both CpIAP and OpIAP. 
These motifs, which are proposed to interact with DNA 
5 (Birnbaum et al., 1994), are not seen in NAIP (Table 4). 
NAIP contains two potential transmembrane regions that 
bracket an inhibitor of apoptosis domain and a contiguous 
GTP binding site. Additional searches of protein domain 
programs generated the following more specific results 
10 than the aforementioned protein domain evaluation. 

1. Residues 1-91: an N terminal domain with no 
recognizable motifs; 

15 2. Residues 92-110: a hydrophobic domain predicted 

by the MEMSAT program (Jones et al., 1994) to 
be a membrane spanning domain; 

3. Residues 163-477: a domain that shows homology 
20 with baculoviral inhibitors of apoptosis 

proteins followed by, and immediately upstream 
of the next hydrophobic domain, a GTP/ ATP 
binding site; 

25 4. Residues 479-496: hydrophobic domain predicted 

by MEMSAT to be a membrane spanning domain; 

5. Residues 497-1232; a possible receptor domain 
containing four N-linked glycosylation sites 
30 and a procaryotic lipid attachment site. 

We know of at least three exons that comprise 4 00 bp 
of 5' untranslated region (5'UTR) ; it is possible that 
more exist. A striking feature of this region is the 
35 presence of a perfect duplication of a 90 bp region in 

the 5' UTR before exon 2 and in the region bridging exons 
2 and 3 (Table 4). In addition, the 3' untranslated 
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region comprising exon 17 has been found to contain a 550 
bp interval that has potential coding region detected by 
the GRAIL program with high homology (P=l.le-37) to the 
chicken integral membrane protein, occludin (Furuse et 
5 al. f 1993). There exists, the possibility that this 
represents a chimeric transcript, Occludin homologous 
sequence has been detected in four different cDNA clones 
and two isoforms of the gene. The possibility of the 
occludin sequence representing a coding exon of the NAIP 

10 gene with the putative 3' UTR actually being 

heteronuclear RNA is also unlikely given the consistency 
with which the 3' UTR is observed and the presence of in 
frame translational stop codons mapping upstream of the 
region of occludin homology. Preliminary RT-PCR analysis 

15 indicates that the occludin tract is transcribed. 

Tissue Expression 

Hybridization of a Northern blot containing adult 
tissue mRNA with an exon 14 probe detected bands only in 

20 adult liver (approximately 6 and 7 kb bandsj and placenta 
(7 kb, Figure 6) . Although the level of expression in 
adult CNS is not sufficient to result in visible bands on 
Northern analysis, successful reverse transcriptase-PCR 
(RT-PCR) amplification of the NAIP transcript using 

25 spinal cord, fibroblast and lymphoblast RNA suggests 
transcriptional activity in these tissues. 

Detection of Truncated and Internally 
Deleted Versions of the NAIP gene 

30 In the analysis of the PAC contig, the clones 238D12 

and 3 0B2 were noted to show significant sequence 
similarity with„..125D9 but not to contain the NotI site in 
PAC 125D9 that is located in NAIP exon 6. This indicated 
the possibility of duplicated copies of the NAIP gene and 

35 so further analysis by hybridization of Southern blots 
containing PAC DNA with NAIP exon probes and PCR STS 
content assessment was undertaken. In this manner, two 
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aberrant versions of the NAIP locus were detected, one 
with exons 2 to 7 deleted (PAC 238D12), and another with 
exons 6, 7 and 12 to 15 deleted (PACs 30B2 and 25017) . 
The presence of identical sized bands in both genomic and 
5 PAC DNA on Southern blot analysis as well as PCR results 
outlined below obviate the possibility that the deletions 
represent in vitro PAC artifacts rather than the in vivo 
situation. Thus, genomic DNA Southern blots hybridized 
with NAIP exon probes revealed more bands than would be 

10 expected with a single intact copy of the NAIP gene. For 
example, probing of blots containing BamHI restricted 
genomic DNA with NAIP exons 3-11 should lead to a single 
band comprised of equal sized contiguous 14.5 kb BamHI 
fragments in the intact NAIP locus (Figure 11) ♦ Instead, 

15 two additional bands are seen at 9,4 and 23 kb (Figure 
14) , fragments that are seen in PACs 2 3 8D12 and 
30B2/250I7 respectively. The 9-4 fragment BamHI has been 
subcloned from a cosmid and found to contain exons 8-11 
with a deletion incorporating exons 2 to 7 occurring just 

20 upstream of the 8th exon (Figure 11) . The 23 kb band is 
generated by a 6 kb deletion removing a BamHI site 
leading to the replacement of the two contiguous 14.5 kb 
BamHI fragments with a 23 BamHI fragment containing exons 
2 to 5 and 8 to 11 and lacking exons 5 and 6 as depicted 

25 in Figure 11. The left side of this deletion was mapped 
by the fact that amplification with primers 1933 and 1926 
generated a product whereas PCR with 1933 and 1923 did 
not (data not shown) . PCR employing primers 1927 and 
1933, constructed to amplify a 4,2 kb junction fragment 

30 spanning the 6 kb deletion (Figure 11) , generated the 
appropriate product as shown by size and sequencing in 
both genomic DNA and PACs 30B2/250I7. The variable 
dosage of both the 9.4 and 23 kb bands seen in genomic 
DNA from different individuals indicates that the two 

3 5 partially deleted versions of the NAIP gene are present 
in multiple and polymorphic number in the general 
population. 



BNSDOCID: <WO 9612016A1 J, 



VVO 96/12016 



PCT/CA95/005S1 



39 

A further level of complexity was detected with the 
identification of clones from a non-SMA human fetal brain 
cDNA library deleted for exons 11 and 12 (Scheme #1) , 
some of which also had exons 15 and 16 (Scheme #1) absent 
5 (Figure 10). The fact that these deletions result in 
frame shifts and premature protein truncation indicates 
that they are, rather than normal splicing variants, more 
likely the result of transcription of the deleted and 
truncated version of NAIP gene that are present in the 
10 general population (Figure 11). In all, a profile of a 
region containing a variable number of copies of 
internally deleted and truncated versions of the NAIP 
locus, some of which are transcribed, has emerged from 
our analysis. 

15 Probings of blots containing DNA from the somatic 

cell hybrid HHW 1064 (Gilliam et al., 1989) with NAIP 
exonic probes indicates that all forms of the NAIP gene 
are confined to the 30 Mb deleted region of 5qll-13 . 3 
contained in the derivative chromosome 5 of this cell 
20 line. This finding has been confirmed by FISH probings 
with NAIP exon 13 probe (unpublished data) . 

NAIP Gene Mutational Analysis 

Probing of genomic Southern blots with PCR amplified 
NAIP exons 3 to 10 revealed the absence of a 4.8 kb 
EcoRl/BamHI fragment containing exons 5 and 6 in the four 
affected individuals of consanguineous Type ill SMA 
family 24561 (Figure 11 and 14). The same probing of 
BamHI digested DNA from this family revealed the absence 
of a 14.5 kb band also in keeping with a loss of exons 5 
and 6 as outlined above (Figure 11 and 14) . Similar 
results were observed in two other French Canadian SKk 
families that were also believed consanguineous. 

In order to confirm the proposed deletion of exons 5 
and 6, primers homologous to these exons were made 
(primers 1893, 1864, 1863, 1910 and 1887 identified by 
arrow in Figure 11. Results of a representative PCR 



30 
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amplification of DNA from the family 24561 and a second 
Type III SMA consanguineous family using exon 5 specific 
primers (primer 1864 and 1863) along with a simultaneous 
reaction of an exon 13 sequence included to rule out a 
5 failure of the PCR are shown in Figure 15. Absence of 
amplification of exon 5 can be seen to cosegregate with 
the SMA phenotype. 

In order to determine! if the exon 5 and 6 NAIP gene 
deletion was an SMA mutation, Southern blot analysis was 

10 conducted* An 800 bp EcoRV single copy probe that mapped 
immediately to the 3' side of the 6 kb exon 5 and 6 
deletion was employed (Figure 11) . Hybridization of this 
marker to EcoRI Southern blots detected both a 9.4 kb 
EcoRI fragment containing exons 5 and 6 from the intact 

15 NAIP locus as well as a 3 kb EcoRI band from the exon 5 
and 6 deleted copy of the NAIP gene. Analysis was 
conducted on EcoRI Southern blots containing DNA from 
over 9 00 unrelated members of myotonic dystrophy, ADPKD 
and cystic fibrosis families obtained from our DNA 

20 diagnostic laboratory. The 9.4 kb band was seen in all 
individuals in keeping with the presence of at least one 
copy of exons 5 and 6 in each of the approximately 900 
individuals tested. In addition, the 3 kb band was 
observed in every individual reflecting a virtually 

2 5 complete dispersion of some form of the exon 5 through 6 

deleted NAIP gene in the general population. Moreover, 
the variable band dosage observed for the 3 kb band 
suggested that the number of copies of the exon 5-6 
deleted NAIP gene is polymorphic possibly ranging as high 

3 0 as 4 or 5 copies per genome. 

— PCR analysis was then extended to 110 SMA families, 

employing exon 5 and 6 primers. Seventeen of 38 (45%) 
Type I SMA individuals and 13 of 72 (18%) Type II and III 
SMA individuals were homozygously deleted for these 

3 5 exons. Assuming random assortment of chromosomes and 

therefore taking the square of the observed frequency of 
homozygous exon 5 through 6 deleted individuals yields 
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estimated frequencies for exon 5 through 6 deleted 
chromosomes of 67% in Type I SMA and 42% in Type II/III 
SMA. PCR analysis was next conducted on 168 parents of 
SMA children revealed failure of amplification suggesting 
5 homozygous deletion of exon 5 and 6 in three individuals. 
This finding was confirmed by Southern analysis in the 
two cases with sufficient DNA for this assay. The two 
individuals, aged 2 8 and 35 and both parents of Type I 
SMA children, when interviewed by telephone described 

10 themselves to be physically well, reporting no symptoms 
suggestive of SMA. It was thus concluded that the 
deletion of NAIPs exons 5 through 6 in isolation, while 
possibly reflecting more severe deletions in individuals 
with SMA as outlined below, can be clinically innocuous 

15 associated either with an exceedingly mild SMA or even 
normal phenotype. Clinical assessment of these 
individuals is currently being undertaken • 

Judging bpth by the cDNA clones detected from fetal 
brain libraries as well as the make-up of RT-PCR NAIP 

2 0 products (Figure 2) , many and possibly all truncated 

copies of the NAIP gene appear to be transcribed. Given 
the apparently unaffected status of the three parents of 
individuals with SMA who do not have a copy of exons 4 
and 5 in their genome we believe that the exon 5 through 
25 6 deleted version of NAIP is also translated. In keeping 
with this model, removal of exons 5 and 6 results in an 
in-frame deletion that extends the longest NAIP open 
reading frame upstream to a start methionine in exon 3 at 
nucleotide 211 (Table 4). 

3 0 Furthermore, the protein sequence encoded by the 
— deleted exon 5 and 6 IAP motif is approximately 3 5% 

homologous to the IAP motif encoded in exons 10 and 11 
possibly accounting for the absence of discernible 
phenotype in the three exon 5 through 6 deleted 
35 individuals. One possible model is that a single copy of 
exon 5 through 6 deleted NAIP on each chromosome results 
in the mild SMA phenotype, while individuals with greater 
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than 3 or 4 copies of the exon 4-5 deleted NAIP locus are 
clinically unaffected. The possibility that duplication 
of the SMA gene underlies the disease has recently been 
proposed by DiDonato et al. (1994)* 
5 RT-PCR amplification of RNA from SMA and non-SMA 

tissue. The results of RT-PCR amplification using RNA 
from both non-SMA and SMA individuals as template are 
shown in Figure 16. 

We have established that at least some of the 

10 internally deleted and truncated NAIP versions are 

transcribed. In order to distinguish between transcripts 
from the intact NAIP gene which would produce a 
functional protein from those that would not, an effort 
was made to RT-PCR amplify transcripts that were as large 

15 as possible. Given the 2.2 kb size of exon 14, this was 
found to be one which encompassed exon 2 and the 5' end 
of exon 13 . No product was detected at the level of 
ethidium bromide staining after first round PCR. 
Therefore, second round nested amplification was 

20 undertaken as described in respect of the previous 
description of Figure 16. 

A representative subset of RT-PCR experiments are 
shown in Figure 16. PCR of reverse transcribed product 
using RNA from non-SMA tissues as template and reverse 

25 transcribing from exons 10 or 13 consistently amplified 
product of the expected size. In contrast, similar 
RT-PCR experiments on RNA from SMA tissue revealed no 
amplification in five cases in keeping with the marked 
down regulation or complete absence of the intact 

30 transcript in such individuals (Figure 16A) . The RNA 
obtained from the SMA tissues was no more than 12 hours 
post-mortem. As we have no difficulty in amplifying 
intact NAIP transcript from normal tissue which is 24 hr 
post mortem, we do not believe the difficulty in 

3 5 amplification arises from RNA degradation. Furthermore, 
difficulty with amplification was seen for all SMA 
tissues which suggests against the possibility that NAIP 
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is transcribed solely in the motor neuron with depletion 
of this cell type in SMA resulting in RT-PCR failure in 
spinal cord tissue. 

In the cases where amplification was observed, 
5 sequencing of RT-PCR products has revealed the following 
findings, as shown in figures 16A, 16B and 16C: 

(i) an in-frame deletion of codons 153 and 190 from 
the 3 'end of exon 5 from sample a9. 

(ii) deletion of exon 6 resulting in a frame shift 
10 with a stop codon occurring 73 nucleotides into exon 7 in 
a product amplified by exon 5 primer 1864 and exon 13 
primer 1974 from sample a2. 

(iii) an approximate 50 nucleotide insertion in a 
product amplified by exon 4 primer 18 8 6 and exon 13 
15 primer 1974 from sample a7. 

(iv) deletion of a glutamic acid codon number 158 in 
exon 5 in association with deletion of exon 11 and 12 in 
a product amplified by exon 5 primer 1864 and exon 13 
primer 1974 from sample a3 . 
20 (v) deletion of exons 11 and 12 introducing a frame 

shift and a stop codon 14 nucleotides into exon 13 in a 
product amplified by exon primer 9 primer 1844 and exon 
13 primer 1974 in sample a2 , a3 , a9 and all. 

In all, employing PCR on material reverse 
25 transcribed from exon 13, we have observed successful 
amplification of the appropriate product from all 12 
non-SMA tissues attempted and in only one of 12 SMA 
tissues. In the latter case, sample al2, amplification 
was from exons 13 to 4 only, whether the transcript also 
3 0 incorporates exons 2 to 3 or 14 to 17 is unknown. We 

believe that these data provide strong evidence for NAIP 
being the SMA gene. 

Role of NAIP Protein 
3 5 The discovery of a neuronal apoptosis inhibitor 

protein gene in the SMA___region of chromosome 5 
demonstrates that the SMA condition is a result of 
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deletions in the apoptosis inhibitor protein domains. 
The long time survival of motorneurons is dependent on 
the production of complete neuronal apoptosis inhibitor 
protein. The deletion of the apoptosis inhibitor protein 
5 domain compromises the protein activity. We have 

demonstrated that approximately 70% of all SMA affected 
individuals have deletions of exons 5 and 6 of chromosome 
5. 

The identified region of 5ql3 . 1 contains a variable 
10 number of copies of intact and partially deleted forms of 
the NAIP gene. While we cannot rule out the presence of 
additional loci in 5ql3„l that when mutated contribute to 
the SMA phenotype, we believe that mutations of NAIP gene 
are necessary and possibly sufficient for the genesis of 
15 SMA. In contrast to most autosomal recessive diseases 

where causal mutations are usually detected in the single 
copy of a given gene, we propose that an SMA chromosome 
is characterized by a paucity or, for severe SMA 
mutations, an absence of both the intact NAIP gene as 
2 0 well as that version which has had exons 3 and 4 deleted. 
The genesis of such chromosomes may involve unegual 
crossovers leaving the chromosome depleted for these loci 
with the resulting absence of the NAIP gene product 
leading to SMA* 

25 

Diagnosis of SMA 

The delineation of an SMA genotype in a given 
individual is complicated by the unusual amplification of 
the NAIP gene in the 5ql3.1 region. Probings of Southern 

30 blots containing genomic DNA with NAIP exon probes 
invariably reveal bands resulting from copies of 
internally deleted and truncated versions of the NAIP 
gene. The presence of variable numbers of the different 
forms of the NAIP loci in the general population is 

35 therefore the norm and not diagnostic of an SMA mutation 
per se f complicating the mutational analysis of the NAIP 
gene. If the detection of genomic DNA containing altered 
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NAIP loci affords no proof of an SMA chromosome then, by 
default, the search must be for the absence of the normal 
NAIP gene. However, we have detected rare individuals 
with no copies of exons 3-4 in their genome who are 
5 clinically unaffected, an observation that is in keeping 
with what we know of NAIP gene structure. Consequently, 
the identification of an SMA chromosome is contingent on 
the absence of both the intact as well as the exons 3-4 
only deleted forms of NAIP. Assaying for their absence 

10 is complicated by the presence of segments of normal NAIP 
gene in each of the other, more extensively deleted, 
forms of the NAIP locus. One can see, for example, that 
if a given SMA individual had in .their genome only the 
deleted versions of NAIP found on PACs 238D12 and 30B2 f 

15 that is exons 1-6 deleted and exons 5,6 and 11-14 
deleted, respectively (Figures 10 and 11} in their 
genome, they would appear by PCR and Southern analysis to 
have the exons 5-6 only deleted version of NAIP. and 
therefore to have non-SMA chromosomes. We believe that 

20 many and perhaps most of the numerous exon 5-6 deleted 
SMA individuals we have observed actually have 
chromosomes with such a configuration, containing neither 
the intact NAIP loci nor the exons 5-6 only deleted 
version but rather, some other combination of more 

25 severely truncated/deleted versions of the locus with 
resultant absence of intact NAIP translation. Support 
for this interpretation comes from our inability to 
amplify normal NAIP transcripts employing RT-PCR on RNA 
from Type I SMA tissue. 

3 0 In all, the evidence in support of mutations in or 

the absence of the NAIP gene causing SMA includes the 
following: 

(i) The strong possibility that the NAIP, given its 
homology with baculoviral IAPs, functions as an inhibitor 
3 5 of apoptosis. This characteristic is wholly compatible 
with the pathology of SMA. It is noteworthy that 
mutations in a regulator of apoptosis have been 
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(Oppenheim 1991, Sarnat, 1992). 

(ii) The mapping of the NAIP locus within the 
recombination defined critical SMA interval and the fact 
5 that the three polymorphic markers that have been shown 
to be in strong linkage disequilibrium with type I SMA; 
CATT-40GI (McLean et al., 1994), C272 (Melki et al., 
1994) and AG-1 (DiDonato et al., 1994) all map to PAC 
12 5D9 and are present on NAIP introns (Figure 9C) . 

10 (iii) The nature of linkage disequilibrium observed 
between the type 1 SMA phenotype and the 5ql3.1 markers. 
We have shown that the CATT-4 0G1 CTR sublocus which is 
frequently duplicated on non-SMA chromosomes (Roy et al., 
1994) , is deleted in 8 0% of type 1 SMA chromosomes 

15 compared with 45% of non-SMA chromosomes (McLean et al., 
1994) . This finding is in keeping with a depletion of 
the number of NAIP genes on SMA chromosomes. In a 
similar fashion, Melki et al. , 1994, have observed "a 
heterozygote deficiency" consisting of a reduced number 

20 of bands for the C272 CTR in Type I SMA, reflecting, they 
propose, chromosomal deletions. DiDonato et al., (1994) 
have also seen a striking reduction in the number of AG1 
CTR sub- loci in Type I SMA individuals when compared with 
non-SMA individuals. We believe that the observation by 

25 three groups of the depletion of these intraNAIP markers 
on Type I SMA chromosomes fits well with the proposed 
model of a lack or absence of both the intact and exon 
5-6 deleted form of the NAIP gene underlying the disease, 
(iv) The markedly increased frequency of NAIP exon 

3 0 5-6 deletions observed in SMA chromosomes (approximately 
67% of type 1 SMA chromosomes and 42% of type 2/3 SMA 
chromosomes) compared with that detected for non-SMA 
chromosomes (2-3%) . As outlined above, we believe that 
this phenomenon reflects the rarity or absence of both 

35 the intact NAIP gene as well as the NAIP version with 
only exons 5 through 6 deleted in the SMA chromosomes, 
leaving only the more significantly internally deleted 
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and truncated forms of the NAIP gene present. 

(v) Our consistent inability to RT-PCR amplify 
appropriate size transcripts from RNA obtained from 11 of 
12 SMA individuals despite success with 12 of 12 RNAs 
5 from non-SMA individuals. Furthermore, sequencing of 
those RT-PCR products that could be obtained from type 1 
SMA material revealed a variety of mutations and 
deletions. 

(vi) The presence of a variable number of copies of 
10 truncated and internally deleted versions of the NAIP 
gene is similar to the situation reported in the 
autosomal dominant polycystic kidney disease gene (ADPKD, 
European Polycystic Kidney Disease Consortium, 1994). In 
this case portions of unprocessed pseudogenes 
15 corresponding to the causative gene were found to map 

elsewhere on chromosome 16p. The key difference, is that 
with the NAIP locus the mutated form of the gene is 
amplified. 

In this regard the NAIP region of 5ql3.1 has more 

20 similarity to the area of chromosome 6 containing CYP21, 
the gene that encodes steroid 21-hydroxylase (Wedell and 
Luthman, 1993). CYP21, which when mutated causes an 
autosomal recessive 21-hydroxylase deficiency, has been 
observed in 0-3 copies in individuals. There also exists 

25 in the region a variable number of inactive pseudogene 
copies of CYP21 known collectively as CYP21P. The 
majority of the CYP21 mutations that have been observed 
in 21-hydroxylase deficiency can also be found in some 
form of CYP21P and it is thought that the pseudogenes act 

30 as a source of the mutations observed in CYP21 . The 

truncated and internally deleted NAIP genes are analogous 
to CYP21P only instead of the gene conversion postulated 
for CYP21/CYP21P it is possible that unequal crossing 
over results in chromosomes deleted for forms of the NAIP 

35 gene that encode functional protein. The existence of a 
polymorphic number of mutated NAIP genes on 5ql3.1 is a 
credible mechanism for generation of SMA chromosomes in 
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this fashion. 

Baculoviral JAPs 

NAIP shows significant homology with the two 
5 baculoviral gene products, CpIAP and OpIAP, that are 
capable of inhibiting insect cell apoptosis (Table 4) , 
Insect cell apoptosis following baculoviral infection has 
been well documented and is postulated to be a defence 
mechanism. Premature death of infected insect cells 

10 result in an attenuation of viral replication (Clem and 
Miller, 1994a) . CpIAP and OpIAP are thought to represent 
baculoviral responses to this apoptotic mechanism. Both 
act independently of other viral proteins to inhibit host 
insect cell apoptosis, thereby permitting increased viral 

15 proliferation (Clem and Miller, 1994a, 1994b). They are 
known to be strongly similar only to each other; until 
now no sequences similarities with cross phyla proteins 
have been ^reported. Their mode of action is unknown, 
although some interaction with DNA has been postulated. 

20 The role and cellular localization of NAIP has not 

yet been established. However, we believe that the 
significant sequence similarity between NAIP and the 
baculoviral lAPs, especially over such a considerable 
phylogenic distance, combined with the previously 

25 postulated role of inappropriate apoptosis in the 

pathogenesis of SMA make it likely that NAIP serves as an 
apoptosis inhibitor in the motor neuron. Transfection 
assays employing NAIP both in insect and mammalian 
neuronal cells will help in this regard. 

30 One possibility is that specific ligand binding of 

the carboxy terminus of the NAIP activates the GTP 
binding site which in turn activates the IAP domain. The 
survival of a motor neuron might, therefore, be dependent 
on the presence of the ligand(s) : should the 

3 5 concentration drop below a critical threshold, the IAP 
domains cease to function with ensuing cell death. This 
represents a possible mechanism for the natural winnowing 
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derived neurotrophic factor (BDNF) and CNTF (Mitsumoto et 
al., 1994). 

The role of the lipid attachment site in NAIP is 
unknown. Similar sites have been known to serve as 
5 procaryotic protein leader sequences usually situated in 
the protein's amino terminus. We have detected the 
consensus pattern in 218 human sequences in the 
Swiss-Protein Database (release 28) . These sequences are 
present in a variety of functional settings; 

10 transmembrane regions, signal sequences, extracellular 
and cytoplasmic domains. One possibility is that the 
lipoprotein attachment site is extracellular and binds a 
constituent of the Schwann cell proteolipid in a manner 
that has been postulated for the apoptosis inhibiting 

15 interaction of iptegrin with the extracellular matrix 
(Meredith et al., 1993; Frisch and Francis, 1994). 
Furthermore, the site may play a more active role in the 
hepatic form of the NAIP that we have observed on 
Northern blot analysis. It is noteworthy that serum 

2 0 fatty acid abnormalities have been detected in children 

with SMA (Kelley and Sladky, 1986) . 

The identified region of 5ql3.l contains, in 
addition to the NAIP gene, a variable number of copies of 
internally deleted and truncated forms of the gene. We 
25 believe that a lack or absence of both the intact NAIP 

gene and the NAIP locus with exons 5 and 6 deleted from a 
given individual's genome are likely to cause SMA. In 
this regard, the identification of NAIP has allowed us to 
develop accurate molecular based diagnoses of SMA as well 

3 0 as directing the formulation of conventional and genetic 
— therapies for these debilitating conditions. 

Furthermore, the identification of genes showing homology 
with the NAIPlocus and proteins that interact With NAIP 
may help in the continuing elucidation of apoptotic 
35 mechanisms in mammalian cells. 
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EXAMPLES 
Family Material 

Clinical diagnoses conducted as described in 
MacKenzie et al. (1993) with all patients fulfilling the 
5 diagnostic criteria given therein. DNA was isolated froin 
peripheral leukocytes as described (MacKenzie et al., 
1993). 

Genetic and Linkage Disequilibrium Analyses 

10 Genotyping with microsattelite markers was as 

outlined in MacKenzie et al. X1993) and McLean et al. 
(1994). The following 5ql3.1 loci were used as 
described: D5S112 (Brzustowitcz et.al., 1990), D5S351 
(Hudson et al. , 1992), D5S435 (Soares et al., 1993), 

15 D5S557 (Francis et al., 1993), D5S629 and D5S637 

(Clermont et al., 1994), D5S684 (Brahe et al, 1994), 
Y98T, Y97T, Y116T, Y122T and CMS (Kleyn et al., 1993), 
CATT (Burghes et al., 1994, McLean et al., 1994) and 
MAP1B (Lien et al., 1991). 

20 Linkage disequilibrium analyses were conducted using 

parameters that can accommodate the multiple alleles seen 
with microsatellite repeats. Given the complexities 
inherent in disequilibrium analyses, a total of 4 
different parameters for which multiple alleles may be 

25 used were employed. These were Di j , Dij' and D' as 

defined in Hedrick (1987) as well as the chi square test. 
Two of these, Dij and Dij' have given the best a 
posteriori positional information in a previous study on 
myotonic dystrophy (Podolsky et al., 1994). The patient 

3 0 and control population is as outlined in McLean et al. 
(1994). 

Cosmid, YAC and PAC Arraying 

Cosmid and YAC contig assembly was as outlined in 
35 Roy et al. (1994). PACs were constructed as outlined in 
Ioannou et al. (1994). Using these proc dures three PAC 
libraries have been constructed with a combined total of 



BNSDOCID: <WO 9612016A1J_> 



WO 96/12016 



PCT/CA95/C05S1 



175,000 clones and propagated as individual clones in 
microtiter dishes (Ioannou et al., unpublished results). 
Pools derived from the three libraries (designated LLNL 
PAC1, RPCI1 and RPCI2) were screened with 5ql3.1 STS's. 
5 Positive PACs were arranged into a contiguous and 

overlapping arrays by further analysis with additional 
STSs combined with probings of Southern blots containing 
PAC DNA by single copy genomic DNA and cDNA probes. 

10 DNA JteJiipuIatioii and Analysis 

Four genomic libraries containing PAC 125D9 insert 
were constructed by BamHI, BamHI/NotI , total and partial 
Sau3al (selected for 5kb insert size) digestions of the 
PAC genomic DNA insert and subcloned into Bluescript 

15 vector. Sequencing of approximately 4 00 bp of both 
termini of 200 five kb clones from the partial Sau3AI 
digestion library in the manner of Chen et al. (1993) was 
undertaken* 

Coding sequences from the PACs were isolated by the 
20 exon amplification procedure as described by Church et 
al. (1994). PACs were digested with BamHI or BamHI and 
Bglll and subcloned into pSPL3. Pooled clones of each 
PAC were transfected into COS-1 cells. After a 24h 
transfection total RNA was extracted. Exons were cloned 
25 into pAMPIO (Gibco, BRL) and sequenced utilizing primer 
SD2 (GTG AAC TGC ACT GTG ACA AGC TGC) . 

DNA sequencing was conducted on an ABI 373A 
automated DNA sequencer. Two commercial human fetal 
brain cDNA libraries in lambda gt (Stratagene) and lambda 
30 ZAP (Clontech) were used for candidate transcript 

isolation. The Northern blot was commercially acquired 
(Clontech) and probing was performed using standard 
methodology. 

In general, primers used in the paper for PCR were 
35 selected for T ra s of 60°C and can be used with the 

following conditions: 30 cycles of 94°C, 60s; 60°C, 60s; 
72 °C, 90s. PCR primer mappings are as referred to in the 
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figure legends and text. Primer sequences are as 
follows: 

1258 ATg CTT ggA TCT CTA gAA Tgg - Sequence ID No. 3 
5 1285 AgC AAA gAC ATg Tgg Cgg AA - Sequence ID No. 4 

1343 CCA gCT CCT AgA gAA AgA Agg A - Sequence ID No. 5 

1844 gAA CTA Cgg CTg gAC TCT TTT - Sequence ID No. 6 

1863 CTC TCA gCC TgC TCT TCA gAT - Sequence ID No. 7 

1864 AAA gCC TCT gAC gAg Agg ATC - Sequence ID No. 8 
10 1884 CgA CTg CCT gTT CAT CTA CgA - Sequence ID No. 9 

1886 TTT gTT CTC CAg CCA CAT ACT - Sequence ID No. 10 

1887 CAT TTg gCA TgT TCC TTG CAA g - Sequence ID No. 11 
1893 gTA gAT gAA TAG TgA TgT TTC ATA ATT - Sequence ID No. 
12 

15 1910 TgC CAC TgC CAg gCA ATC TAA - Sequence ID No. 13 

1919 TAA AC A ggA CAC ggT AC A gTg - Sequence ID No. 14 

1923 CAT gTT TTA AgT CTC ggT gCT CTg - Sequence ID No. 15 

1926 TTA gCC AgA TgT gTT ggC ACA Tg - Sequence ID No. 16 

1927 gAT TCT ATg TgA TAg gCA gCC A - Sequence ID No. 17 
2 0 1933 gCC ACT gCT CCC gAT ggA TTA - Sequence ID No. 18 

1974 gCT CTC AgC TgC TCA TTC AgA T - Sequence ID No. 19 

1979 ACA AAg TTC ACC ACg gCT CTg - Sequence ID No. 20 

RT-PCR 

25 cDNA was synthesized in . a 20 /il reaction utilizing 7 

/xg of total RNA. The RNA was denatured for 5 minutes at 
95 °C and cooled to 37 °C. Reverse transcription was 
performed at 42 °C for 1 hour after addition of 5^1 5X 
reverse transtriction buffer, 2^1 0.1 M DTT, 41 2.5 mM 

30 dNTPs, 8 units RNasin, 25 ng cDNA primer (1285) and 400 
units of MMLV (Gibco, BRL) . 1 jxl of cDNA was utilized as 
template in subsequent 50/xl PCR reactions. 1 fil of this 
primary PCR was utilized as template for secondary PCR 
amplifications . 
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Sequence An&lysis 

Primary DNA sequence data was edited with the TED 
program (Gleeson and Hillier, 1991), As many of the 
partially sequenced 200 five kb clones from the partial 
5 Sau3AI digestion library as possible were arranged into 
overlapping arrays using the XBAP Staden package (Dear 
and Staden, 1991) . Sequence data was also assembled and 
- analyzed using the GCG Sequence analysis (Genetics 
computer group, 1991) . Protein domain homologies were 
10 found by searching the Prosite Protein database (Bairoch 
and Bucher, 1993). The MEMSAT program was also used to 
search for transmembrane domain regions (Jones et al., 
1994) . 
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TABLE 1 

The YACs isolated in this study, their size and library of origin are listed. NCE: 
National Centers of Excellence, Toronto, Ontario, Canada. ICRF: Imperial Cancer 
Research Fund, CEPH: Centre d'Etude du Polymorphisme Humaine. 



YAC 


SIZE 


LIBRARY 


12H1 


560kb 


NCE 


12H4 


270kb 


NCE 




750kb 


NCE 


i. i I1J 


630kb 


NCE 


33H10 


1.3Mb 


NCE 


H0416 


390kb 


ICRF 


E0320 


440kb 


ICRF 


G1138 


850kb 


ICRF 


A0848 


350kb 


ICRF 


D06100 


580kb 


ICRF 


D0981 


450kb 


- ICRF 


919C2 


800kb 


CEPH 


755B12 


1Mb 


CEPH 


754H5 


500kb 


CEPH 
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TABLE 2 



PROBE 


SOURCE/ . 
REFERENCE 


PROBE 


SOURCE/ 
REFERENCE 


YD33 


bib developed from Alu- 
5-trp PCR product of 
YAC D061G0 


I li. 1 


bib developed 
from inter Alu-5' 
PCR product of 
YAC 12Hl(this 
study) 


Y14.1 


STS developed from Alu- 
3'-ura PCR product of 
YAC 12H4 (this study) 


Y15.1 


STS developed 
from i4/u-5'-ura 
PCR product of 
YAC 12H4 (this 
study) 


Y9.2 


STS developed from inter- 
Alu-5' VCR product of 
YAC 27H5 (this study) 


Y5.6 


STS developed fron 
inter-A/u-J'PCR 
product of YAC 
24D6 (this study) 


Y11.2 


STS developed from Alu- 
3 -trp PCR product of 
YAC 33H10 (this study) 


pZY8 


subcloned 1.3 kb 
HindUl fragment 
from cosmid 250B6 
(this study) 


H7T733 


Alu 33-T7 PCR product 
from cosmid 1H7 (this 
study) 


pl51.2 


subcloned 1.2 kb 
mttt-Alu PCR 
product of cosmid 
15F8 (this study) 




Alu Jj-i J rtK prOQUCI OI 

cosmid IG10 (this study) 


nAlY) 1 
P'rvi.l 


Bam KVHindm 
fragment of cosmid 
40GI (this study) 


G3T733 


Alu 33-T7 PCR product of 
cosmid IG3 (this study) 


pL7 


liver transcript 
isolated with 
subcloned 1.1 kb 
BamYMSali 
fragment from 
jouu tuns suiuyj 


p2281.8 


subcloned 1.8 kb Hindm 
fragment of cosmid 228C8 
(this study) 


F933 


\mtt-Alu PCR 
product of cosmid 
1F9 (this study) 


pGAl 


fetal brain transcript 
isolated with cosmid 250B6 


P- 

glucuronidase 


(Oshima et al. 
1987) 


MAP1B 


(Lien et al. 1991) 


Y122T 


(Kleynetal., 1993) 


D5S351 


(Yaraghi et al., in press) 


CMS-1 


(Kleyn et al., 1993) 
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PROBE 


SOURCE/ 
REFERENCE 


PROBE 


SOURCE/ 1 
REFERENCE 


D5S557 


(Francis et al., 1993) 


Y98T 


(Kleyn et al., 1993) 


D5S112 


(Brzustowitcz et al., 1990) 


Y97T 


(Kleyn et al., 1993) 


Y112U 


(Kleyn et al., 1993) 


Y88T 


(Kleyn et al, 1993) 


Y119T 


(KJeyn et al, 1993) 


Y116U 


(Kleyn etal., 1993) 


CATT-1 


(Burghes et al., 1994; 
McLean et al., in press) 


Y55U 


(Kleyn et al., 1993) 


D5S127 


(Sherrington et al., 1991) 


Y38T 


(Kleyn et al., 1993) 


D5S435 


(Soares et al., 1993) 


D5S125 


(Hudson et al., 
1992) 


Y107U 


(Kleyn etal., 1993) 


Y97U 


(Kleyn etal., 1993) 


D5F149 (C212) 


(Melki et al., 1994) 


D5F151 
(C171) 


(Melki etal., 1994) 


D5F150 (C272) 


(Melki et al., 1994) 


D5F153 
(C161) 


(Melki et al., 1994) 


D5S637 


(Clermont et al., 1994) 


D5S629 


(Clermont et al., 
1994) 
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Table 3 

The homology of the GA1 component of neuronal apoptosis inhibitor protein gene compared for 
homology with the inhibitor apoptosis polypeptides of the viruses Cydia pomenella and Orgyia 
pseudotsugata. 

1 .50 



Cydia pomonella 

Orgyia pseudots 

cGAl- concensus TR7VDKPQKM A7QQXASDER ISQFDHNLLP ELSALLGLDA VQLAXELEEE 

51 100 

Cydia pomonella 

Orgyia pseudots 

CGA2- concensus EOKERAKMQX GYNSQMRSEA KRLKTFVTYE PYSSWIPQEM AAAGFYFTGV 

101 150 

Cydia pomonella 

Orgyia pseudots * MS 

cGAl- concensus KSGIQCFCCS LILFGAGLTR LPIEDHKRFH PDCGFLLNKD VGN2AXYDIR 

151 200 

Cydia pomonella M SDLR. .LEEV RUTTFEKWP . .VSFLSPETM AKNGFYTLGR 

Orgyia pseudotB SRAIGAPQEG ADMX. -NKAA RLGTYTNWP . . VQFLEPSRM AASGFYYLGR 

CGAl -concensus VXNLKSRXRG GKMRYQEEEA R1ASFRNWPF YVQGISPCVL S2AGFVFTGK 

201 250 

Cydia pomonella SDEVRCAFCX VEIMRWKEGE DPAADHXXWA PQCPFVKGID VCGSI 

Orgyia pseudots GDEVRCAFCK VEITNWVRGD DPETDHKRWA P0CPFVRK 

cGAl- concensus QDTVQCFSCG GCLGNWEEGD DPVKEHAKWF PKCEFLRSXX SSEEITQYIQ 

251 300 

Cydia pomonella VT7 NNIQNT7THD T2IGPA. . . . HPKYAHEAAR VXSFHNWPRC 

Orgyia pseudots NA HDTPHDRAPP ARSAAA. . . . HPQYATEAAR LRTFAEWPRG 

cGAl^ concensus SYKGFVDITG EHFVNSWV0R ELPMASAYCN DSIFAYEELR LDSFKDWPRE 

301 350 
Cydia pomonella MKQRPEQMAD AGFFYTGYGD NTKCFYCDGG LKDWEPEDVP WEQHVRWFDR 

Orgyia pseudots LKQRPEE1AE AGFFYTGQGD KTRCFCCDGG LKDWEPDDAP WQQHARWYDR 

cGAl -concensus SAVGVAA1AK AGLFYTG1KD IVQCFSCGGC LEKWQEGDDP LDDHTRCFPN 

351 400 
Cydia pomonella CAYVQLVXGR DYVQJCVI . . . TEACVLPGEN TTVSTAAPVS EPIPETKIEK 

Orgyia pseudots CEYVLLVKGR DFVQRVM. . , TEACWRDAB N EPHIER 

cGAl- concensus CPFLQNMRSS AEVTPDLQSR GELCELLETT SESNLEDSIA VGPIVPEMAQ 

401 450 

Cydia pomonella EPO VEDSKLCKIC YVEE CIV CFVPCGHWA 

Orgyia pseudots PAV EAE VADDRLCKIC LGAE KTV CFVPCGHWA 

CGAl- concensus GEAQWFQEAK KLNEQLRAAY TSASFRHMSL LDISSD1ATD HLLGCDLS1A 

451 500 

Cydia pomonella CAXCALSVDK CPMCRKIVTS VLKVYFS 

Orgyia pseudots CGKCAAGVTT CPVCRGOLDK AVRMYQV 

cGAl- concensus SKHISKPVQE PL.VLPEVFGN LNSVWCVEGE AGSGKTVLLK KIAFLWASGC 
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TABLE 3 (continued) 



cGAl- 
cGAl- 
cGAl- 
cGAl- 
cGAl- 
cGAl- 
cGAl- 
CGA1- 
cGAl- 
cGAl- 
CGA1- 
CGA1- 
CGA1- 
cGAl- 



concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 
concensus 



501 

CPLLNRFQLV 
LXNQVLFLLD 
YLETILEIQA 
PLFVAAICAH 
CGELALKGFF 
FLSPAFQEFL 
LNYVSSLPST 
QLLRGLWQIC 
TLTLGALNLQ 
QVPTIDQDYA 
LSPKQYKIPC 
ESIRPALELS 
DQIFPNLDKF 
S 



FYLSLSSTRP 
DYKEICSIPQ 
FPFYNTVCIL 
WFQYFFDPSF 
SCCFEFNDDD 
AGMRLIELLD 
KAGPKIVSHL 
POAYFSMVSE 
YFFBHPESLS 
SAFEPKNEWE 
LEVDVNDIDV 
KASVTKCSIS 
LCLKELSVDL 



DEGLASIICD 
VIGKLIQKNH 
RKLFSHNMTR 
DDVAVFKSYM 
LAEAGVDEDE 
SDRQEHQDLG 
LHLVDNKESL 
KLLVLALKTA 
LLRSIHFSIR 
RNLAEKEDNV 
VGQDMLEILM 
KLELSAAEQE 
EGNINVFSVI 



QLLEKEGSVT 
LSRTCLLIAV 
LRKFMVYFGK 
ERLSLRNKAT 
DLTMCLMSKF 
LYHLKQINSP 
ENISENDDYL 
YQSNTVAACS 
GNKTSPRAHF 
KSYMDMQRRA 
TVFSASQRIE 
LLLTLPSLES 
PEEFPNFHHM 



550 

EMCMRNIIQQ . 

RTNRARDIRR 

NQSLQKIQKT 

AEILKATVSS 

TAQRLRPFYR 

MMTVSAYNNF 

KHQPEISLQM 

PFVLQFLQGR 

SVLETCFDKS 

SPDLSTGYWX 

LHLNHSRGFI 

LEVSGTIQSQ 

EKLLIQISAE 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 
(i) APPLICANT: 



NAME: UNIVERSITY OF OTTAWA 
STREET: 550 Cumberland Street 
CITY: Ottawa, 
STATE : Ontario 
COUNTRY : Canada 
POSTAL CODE (ZIP)i KIN 6N5 
TELEPHONE: 613-564-5804 
TELEFAX: 613-564-5952 

NAME: RESEARCH DEVELOPMENT CORPORATION OF JAPAN 

STREET: 4-1-8, Honcho, Kawaguchi-ohi 

CITY: Saitama 332 

COUNTRY: Japan 

POSTAL CODE (ZIP): none. 

NAME: MacKENZIE, Alex E. 

STREET: 35 Rockliffe Way 

CITY: Ottawa 

STATE: Ontario 

COUNTRY: Canada 

POSTAL CODE (ZIP): KIN 1A3 

NAME: KORNELUK, Robert G. 

STREET: 1901 Tweed Avenue 

CITY: Ottawa 

STATE: Ontario 

COUNTRY: Canada 

POSTAL CODE (ZIP): K1G 2L8 

NAME: MAHADEVAN, Mani S. 

STREET: 818 South Grammon Road, Apt. 4 

CITY: Madison 

STATE: Wisconsin 

COUNTRY: United States of America 
POSTAL CODE (ZIP): 53719 

NAME: MCLEAN, Michael 

STREET: 1 Halesmanor Crt. 

CITY: Guelph 

STATE: Ontario 

COUNTRY: Canada 

POSTAL CODE (ZIP): NIG 4E1 

NAME: ROY, Natalie 

STREET: 6 McLeod Street, 

CITY: Ottawa 

STATE: Ontario 

COUNTRY: Canada 

POSTAL CODE (ZIP) : K2P 0Z5 

NAME : IKED A, Joh-E. 

STREET: 31-1 Kamineguro 5-chome, Meguro-ku, 
CITY: Tokyo 
COUNTRY: Japan 
POSTAL CODE: 153 
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(ii) TITLE OF INVENTION : NEURONAL APQPTOSIS INHIBITOR PROTEIN, 
GENE SEQUENCE AND MUTATIONS CAUSATIVE OF SPINAL MUSCULAR ATROPHY 

{iii) NUMBER OF SEQUENCES: 20 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.30 (EPO) 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 5502 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE : NO 

(viii) POSITION IN GENOME: 
(C) UNITS: bp 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: It 



TTCCGGCTGG 


ACGTTGCCCT 


GTGTACCTCT 


TCGACTGCCT 


GTTCATCTAC 


GACGAACCCC 


60 


GGGTATTGAC 


CCCAGACAAC 


AATGCCACTT 


CATATTGCAT 


GAAGACAAAA 


GGTCCTGTGC 


120 


TCACCTGGGA 


CCCTTCTGGA 


CGTTGCCCTG 


TGTTCCTCTT 


CGCCTGCCTG 


TTCATCTACG 


ISO 


ACGAACCCCG 


GGTATTGACC 


CCAGACAACA 


ATGCCACTTC 


ATATTGGGGA 


CTTCGTCTGG 


240 


GATTCCAAGG 


TGCATTCATT 


GCAAAGTTCC 


TTAAATATTT 


TCTCACTGCT 


TCCTACTAAA 


300 


GGACGGACAG 


AGCATTTGTT 


CTTCAGCCAC 


ATACTTTCCT 


TCCACTGGCC 


AGCATTCTCC 


360 


TCTATTAGAC 


TAGAACTGTG 


GATAAACCTC 


AGAAAATGGC 


CACCCAGCAG 


AAAGCCTCTG 


420 


ACGAGAGGAT 


CTCCCAGTTT 


GATCACAATT 


TGCTGCCAGA 


GCTGTCTGCT 


CTTCTGGGCC 


480 


TAGATGCAGT 


TCAGTTGGCA 


AAGGAACTAG 


AAGAAGAGGA 


GCAGAAGGAG 


CGAGCAAAAA 


540 


TGCAGAAAGG 


CTACAACTCT 


CAAATGCGCA 


GTGAAGCAAA 


AAGrGTTAAAG 


ACTTTTGTGA 


600 


CTTATGAGCC 


GTACAGCTCA 


TGGATACCAC 


AGGAGATGGC 


GCCCGCTGGG 


TTTTACTTCA 


660 


CTGGGGTAAA 


ATCTGGGATT 


CAGTGCTTCT 


GCTGTAGCCT 


AATCCTCTTT 


GGTGCCGGCC 


720 


TCACGAGACT 


CCCCATAGAA 


GACCACAAGA 


GG TTTCATCC 


AGATTGTGGG 


TTCCTTTTGA 


780 


ACAAGGATGT 


TGGTAACATT 


GCCAAGTACG 


ACATAAGGGT 


GAAGAATCTG 


AAGAGCAGGC 


840 


TGACAGGAGG 


TAAAATGAGG 


TACCAAGAAG 


AGGAGGCTAG 


ACTTGCATCC 


TTCAGGAACT 


900 


GGCCATTTTA 


TGTCCAAGGG 


ATATCCCCTT 


GTGTGCTCTC 


AGAGGCTGGC 


TTTGTCTTTA 


960 
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CAGGTAAACA 


GGACACGGTA 


CAGTGTTTTT 


CCTGTGGTGG 


ATGTTTAGGA 


AATTGGGAAG 


1020 


AAGGAGATGA 


TCCTTGGAAG 


GAACATGCCA 


AATGGTTCCC 


CAAATGTGAA 


TTTCTTCGGA 


1080 


GTAAGAAATC 


CTCAGAGGAA 


ATTACCCAGT 


ATATTCAAAG 


CTACAAGGGA 


TTTGTTGACA 


1140 


TAACGGGAGA 


ACATTTTGTG 


AATTCCTGG6 


TCCAGAGAGA 


ATTACCTATG 


GCATCAGCTT 


1200 


ATTGCAATGA 


CAGCATCTTT 


GCTTACGAAG 


AACTACGGCT 


GGACTCTTTT 


AAGGACTGGC 


1260 


CCCGGGAATC 


AGCTGTGGGA 


GTTGCAGCAC 


TGGCCAAAGC 


AGGTCTTTTC 


TACACAGGTA 


1320 


TAAAGGACAT 


CGTCCAGTGC 


TTTTCCTGTG 


GAGGGTGTTT 


AGAGAAATGG 


CAGGAAGGTG 


1380 


ATGACCCATT 


AGACGATCAC 


ACCAGATGTT 


TTCCCAATTG 


TCCATTTCTC 


CAAAATATGA 


1440 


AGTCCTCTGC 


GGAAGTGACT 


CCAGACCTTC 


AGAGCCGTGG 


TGAACTTTGT 


GAATTACTGG 


1500 


AAACCACAAG 


TGAAAGCAAT 


CTTGAAGATT 


CAATAGCAGT 


TGGTCCTATA 


GTGCCAGAAA 


1560 


TGGCACAGGG 


TGAAGCCCAG 


TGGTTTCAAG 


AGGCAAAGAA 


TCTGAATGAG 


CAGCTGAGAG 


1620 


CAGCTTATAC 


CAGCGCCAGT 


TTCCGCCACA 


TGTCTTTGCT 


TGATATCTCT 


TCCGATCTGG 


1680 


CCACGGACCA 


CTTGCTGGGC 


TGTGATCTGT 


CTATTGCTTC 


AAAACACATC 


AGCAAACCTG 


1740 


TGCAAGAACC 


TCTGGTGCTG 


CCTGAGGTCT 


TTGGCAACTT 


GAACTCTGTC 


ATGTGTGTGG 


1800 


AGGGTGAAGC 


TGGAAGTGGA 


AAGACGGTCC 


TCCTGAAGAA 


AATAGCTTTT 


CTGTGGGCAT 


1860 


CTGGATGCTG 


TCCCCTGTTA 


AACAGGTTCC 


AGCTGGTTTT 


CTACCTCTCC 


CTTAGTTCCA 


1920 


CCAGACCAGA 


CGAGGGGCTG 


GCCAGTATCA 


TCTGTGACCA 


GCTCCTAGAG 


AAAGAAGGAT 


1980 


CTGTTACTGA 


AATGTGCATG 


AGGAACATTA 


TCCAGCAGTT 


AAAGAATCAG 


GTCTTATTCC 


2040 


TTTTAGATGA 


CTACAAAGAA 


ATATGTTCAA 


TCCCTCAAGT 


CATAGGAAAA 


CTGATTCAAA 


2100 


AAAACCACTT 


ATCCCGGACC 


TGCCTATTGA 


TTGCTGTCCG 


TACAAACAGG 


GCCAGGGACA 


2 160 


TCCGCCGATA 


CCTAGAGACC 


ATTCTAGAGA 


TCCAAGCATT 


TCCCTTTTAT 


AATACTGTCT 




GTATATTACG 


GAAGCTCTTT 


TCACATAATA 


TGACTCGTCT 


GCGAAAGTTT 


ATGGTTTACT 


O "1 O A 


TTGGAAAGAA 


CCAAAGTTTG 


CAGAAGATAC 


AGAAAACTCC 


TCTCTTTGTG 


G CGG CG AT CT 


2 340 


GTGCTCATTG 


GTTTCAGTAT 


CCTTTTGACC CATCCTTTGA 


TGATGTGGCT 


GTTTTCAAGT 


*5 Ann 
2 4UU 


CCTATATGGA 


ACGCCTTTCC 


TTAAGGAACA 


AAGCGACAGC 


TGAAATTCTC 


AAAG C AA C TG 


2 4oU 


TGTCCTCCTG 


TGGTGAGCTG 


GCCTTCAAAG 


GGTTTTTTTC 


ATGTTGCTTT 


GAGTTTAATG 


2o2U 


ATGATGATCT 


CGCAGAAGCA 


GGGGTTGATG 


AAGATGAAGA 


TCTAACCATG 


TGCTTGATGA 


2 580 


GCAAATTTAC 


AGCCCAGAGA 


CTAAGACCAT 


TCTACCGGTT 


TTTAAGTCCT 


GCCTTCCAAG 


2o4U 


AATTTCTTGC 


GGGGATGAGG 


CTGATTGAAC 


TCCTGGATTC 


AGATAGGCAG 


GAACATCAAG 


2700 


ATTTGGGACT 


GTATCATTTG 


AAACAAATCA 


ACTCACCCAT 


GATGACTGTA 


AGCGCCTACA 


2760 


ACAATTTTTT 


GAACTATGTC 


TCCAGCCTCC 


CTTCAACAAA 


AGCAGGGCCC 


AAAATTGTGT 


2820 


CTCATTTGCT 


CCATTTAGTG 


GATAACAAAG 


AGTCATTGGA 


GAATATATCT 


GAAAATGATG 


2880 


ACTACTTAAA 


GCACCAGCCA 


GAAATTTCAC 


TGCAGATGCA 


GTTACTTAGG 


GGATTGTGGC 


2940 
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AAATTTGTCC ACAAGCTTAC TTTTCAATGG TTTCAGAACA TTTACTGGTT CTTGCCCTGA 3000 

AAACTGCTTA TCAAAGCAAC ACTGTTGCTG CGTGTTCTCC ATTTGTTTTG CAATTCCTTC 3060 

AAGGGAGAAC ACTGACTTTG GGTGCGCTTA ACTTACAGTA CTTTTTCGAC CACCCAGAAA 3120 

GCTTGTCATT GTTGAGGAGC ATCCACTTCT CAATACGAGG AAATAAGACA TGACCCAGAG 3180 

CACATTTTTC AGTTCTGGAA ACATGTTTTG ACAAATCACA GGTGCCAACT ATAGATCAGG 3240 

ACTATGCTTC TGCCTTTGAA CCTATGAATG AATGGGAGCG AAATTTAGCT GAAAAAGAGG 3300 

ATAATGTAAA GAGCTATATG GATATGCAGC GCAGGGCATC ACCAGACCTT AGTACTGGCT 3360 

ATTGGAAACT TTCTCCAAAG CAGTACAAGA TTCCCTGTCT AGAAGTCGAT GTGAATGATA 3420 

TTGATGTTGT AGGCCAGGAT ATGCTTGAGA TTCTAATGAC AGTTTTCTCA GCTTCACAGC 3480 

GCATCGAACT CCATTTAAAC CACAGCAGAG GCTTTATAGA AAGCATCCGC CCAGCTCTTG 3540 

AGCTGTCTAA GGCCTCTGTC ACCAAGTGCT CCATAAGCAA GTTGGAACTC AGCGCAGCCG 3600 

AACAGGAACT GCTTCTCACC CTGCCTTCCC TjGGAATCTCT TGAAGTCTCA GGGACAATCC 3660 

AGTCACAAGA CCAAATCTTT CCTAATCTGG ATAAGTTCCT GTGCCTGAAA GAACTGTCTG 3720 

TGGATCTGGA GGGCAATATA AATGTTTTTT CAGTCATTCC TGAAGAATTT CCAAACTTCC 3 7 80 

ACCATATGGA GAAATTATTG ATCCAAATTT CAGCTGAGTA TGATCCTTCC AAACTAGTTG 3840 

CCAGTTTGCC AAATTTTATT TCTCTGAAGA TATTAAATCT TGAAGGCCAG CAATTTCCTG 3900 

ATGAGGAAAC ATCAGAAAAA TTTGCCTACA TTTTAGGTTC TCTTAGTAAC CTGGAAGAAT 3960 

TGATCCTTCC TACTGGGGAT GGAATTTATC GAGTGGCCAA ACTGATCATC CAGCAGTGTC 4020 

AGCAGCTTCA TTGTCTCCGA GTCCTCTCAT TTTTCAAGAC TTTGAATGAT GACAGCGTGG 4080 

TGGAAATTGG TTAAAAATGT GTCTGCAGGC ACACAGGACG TGCCTTCACC CCCATCTGAC 4140 

TATGTGGAAA GAGTTGACAG TCCCATGGCA TACTCTTCCA ATGGCAAAGT GAATGACAAG 4200 

CGGTTTTATC CAGAGTCTTC CTATAAATCC ACGCCGGTTC CTGAAGTGGT TCAGGAGCTT 4260 

CC ATTAACTT CGCCTGTGGA TGACTTCAGG CAGCCTCGTT ACAGCAGCGG TGGTAACTTT 4320 

GAGACACCTT CAAAAAGAGC ACCTGCAAAG GGAAGAGCAG GAAGGTCAAA GAGAACAGAG 4380 

C AAG AT C ACT ATGAGACAGA CTACACAACT GGCGGCGAGT CCTGTGATGA GCTGGAGGAG 4440 

GACTGGATCA GGGAATATCC ACCTATCACT TCAGATCAAC AAAGACAACT GTACAAGAGG 4500 

AATTTTGACA CTGGCCTACA GGAATACAAG AGCTTACAAT CAGAACTTGA TGAGATCAAT 4560 

AAAGAACTCT CCCGTTTGGA TAAAGAATTG GATGACTATA GAGAAGAAAG TCAAGAGTAC 4620 

ATGGCTGCTG CTGATGAATA CAATAGACTG AAGCAAGTGA AGGGATCTGC AGATTACAAA 4680 

AGTAAGAAGA ATCATTGCAA GCAGTTAAAC AGCAAATTGT CACACATCAA GAAGATGGTT 4740 

GGAGACTATG ATAGACAGAA AACATAGAAG GCTGATGCCA AGTTGTTTGA GAAATTAAGT 4800 

ATCTGACATC TCTGCAATCT TCTCAGAAGG CAAATGACTT TGGACCATAA CCCCGGAAGC 4860 

CAAACCTCTG TGAGCATCAC AGTTTTGGTT GCTTTAATAT CATCAGTATT GAAGCATTTT 4920 
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CACTTTTTTC CACATAAGGA AACTGGGTTC CTGCAATGAA GTCTCTGAAG TGAAACTGCT 5220 

TGTTTCCTAG CACACACTTT TGGTTAAGTC TGTTTTATGA CTTCATTAAT AATAAATTCC 5280 

GGCATCATAC AGCTACTCCT CCCTACCGCC ACCTCCACAG ACACCACTCT CCTGGTTCCA 5340 

TCTCCTCTGC TGCTTCTAGC TCCCTGCTCT GGCTTCAAGG TGCGCAGGAC CTGCTTCCTT 5400 

GGTGATCCTC TGTAGTCTCC CACACCCCAC ATTATCTACA AACTGATGAC TCCTAATTTA 5460 

CATCTCCAGC TCAGACCTCT CCATCAATCC CAACGCATAC AC 5502 
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(2) INFORMATION FOR SEQ ID NO: 2: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1233 amino acids 

(B) TYPE : amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

Met Ala Thr Gin Gin Lys Ala Ser Asp Glu Arg lie Ser Gin Phe Asp 
1 5 10 15 

His Asn Leu Leu Pro Glu Leu Ser Ala Leu Leu Gly Leu Asp Ala Val 
20 25 30 

Gin Leu Ala Lys Glu Leu Glu Glu Glu Glu Gin Lys Glu Axg Ala Lys 
35 40 45 

Met Gin Lys Gly Tyr Asn Ser Gin Met Arg Ser Glu Ala Lys Arg Leu 
50 55 60 

Lys Thr Phe Val Thr Tyr Glu Pro Tyr Ser Ser Trp lie Pro Gin Glu 
65 70 75 80 

Met Ala Ala Ala Gly Phe Tyr Phe Thr Gly Val Lys Ser Gly lie Gin 
85 90 95 

Cys Phe Cys Cys Ser Leu lie Leu Phe Gly Ala Gly Leu Thr Arg Leu 
100 105 110 

Pro lie Glu Asp Hie Lys Arg Phe His Pro Asp Cys Gly Phe Leu Leu 
115 ' 120 125 

Asn Lys Asp Val Gly Asn He Ala Lys Tyr Asd He Arg Val Lys Asn 
130 135 " 140 

Leu Lys Ser Arg Leu Arg Gly Gly Lys Met Arg Tyr Gin Glu Glu Glu 
145 150 155 160 

Ala Arg Leu Ala Ser Phe Arg Asn Trp Pro Phe Tyr Val Gin Gly He 
165 . 170 175 

Ser Pro Cys Val Leu Ser Glu Ala Gly Phe Val Phe Thr Gly Lys Gin 
180 185 190 

Asp Thr Val Gin Cys Phe Ser Cys Gly Gly Cys Leu Gly Asn Trp Glu 
195 200 205 

Glu Gly Asp Asp Pro Trp Lys Glu His Ala Lys Trp Phe Pro Lys Cys 
210 215 220 

Glu Phe Leu Arg Ser Lys Lys Ser Ser Glu Glu He Thr Gin Tyr He 
225 230 235 240 

Gin Ser Tyr Lys Gly Phe Val Asp He Thr Gly Glu His Phe Val Asn 
245 250 255 

Ser Trp Val Gin Arg Glu Leu Pro Met Ala Ser Ala Tyr Cys Asn Asp 
260 265 270 

Ser He Phe Ala Tyr Glu Glu Leu Arg Leu Asp Ser Phe Lys Asp Trp 
275 280 285 

Pro Arg Glu Ser Ala Val Gly Val Ala Ala Leu Ala Lys Ala Gly Leu 
290 29S 300 
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Phe Tyr Thr Gly lie Lys Asp He Val Gin Cys Phe Ser Cys Gly Gly 
305 310 315 320 

Cys Leu Glu Lys Trp Gin Glu Gly Aap Asp Pro Leu Asp Asp His Thr 
325 330 335 

Arg Cys Phe Pro Aan Cys Pro Phe Leu Gin Asn Met Lys Ser Ser Ala 
340 345 350 

Glu Val Thr Pro Asp Leu Gin Ser Arg Gly Glu Leu Cys Glu Leu Leu 
355 360 365 

Glu Thr Thr Ser Glu Ser Aan Leu Glu Asp Ser He Ala Val Gly Pro 
370 375 380 

He Val Pro Glu Met Ala Gin Gly Glu Ala Gin Trp Phe Gin Glu Ala 
385 390 395 400 

Lys Asn Leu Asn Glu Gin Leu Arg Ala Ala Tyr Thr Ser Ala Ser Phe 
405 410 415 

Arg His Met Ser Leu Leu Asp He Ser Ser Asp Leu Ala Thr Asp His 
420 425 . 430 

Leu Leu Gly Cys Asp Leu Ser He Ala Ser Lys His lie Ser Lys Pro 
435 440 445 

Val Gin Glu Pro Leu Val Leu Pro Glu Val Phe Gly Asn Leu Asn Ser 
450 455 460 

Val Met Cys Val Glu Gly Glu Ala Gly Ser -Gly Lys Thr Val Leu Leu 
465 470' 475 480 

Lys Lys He Ala Phe Leu Trp Al-a Ser Gly Cys Cys Pro Leu Leu Aan 
485 490 495 

Arg Phe Gin Leu Val Phe Tyr Leu Ser Leu Ser Ser Thr Arg Pro Aap 
500 505 510 

Glu Gly Leu Ala Ser He He Cya Aap Gin Leu Leu Glu Lys Glu Gly 
515 520 525 

Ser Val Thr Glu Met Cys Met Arg Asn He He Gin Gin Leu Lya Asn 
530 535 540 

Gin Vai Leu Phe Leu Leu Asp Asp Tyr Lys Glu He Cys Ser He Pro 
545 550 555 560 

Gin Val He Gly Lys Leu He Gin Lys Asn His Leu Ser Arg Thr Cya 
565 570 575 

Leu Leu He Ala Val Arg Thr Asn Arg Ala Arg Asp He Arg Arg Tyr 
580 585 590 

Leu Glu Thr He Leu Glu He Gin Ala Phe Pro Phe Tyr Asn Thr Val 
595 600 605 

Cys He Leu Arg Lys Leu Phe Ser His Asn Met Thr Arg Leu Arg Lya 
610 615 620 

Phe Met Val Tyr Phe Gly Lys Asn Gin Ser Leu Gin Lys He Gin Lys 
625 630 635 640 

Thr Pro Leu Phe Val Ala Ala Ile^Cys Ala His Trp Phe Gin Tyr Pro 
645 650 655 
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Phe Asp Pro Ser Phe Asp Asp Val Ala Val Phe Lys Ser Tyr Met Glu 
660 665 670 

Arg Leu Ser Leu Arg Asn Lys Ala Thr Ala Glu lie Leu Lys Ala Thr 
675 680 685 

Val Ser Ser Cys Gly Glu Leu Ala Leu Lys Gly Phe Phe Ser Cys Cys 
690 695 700 

Phe Glu Phe Asn Asp Asp Asp Leu Ala Glu Ala Gly Val Asp Glu Asp 
705 710 715 720 

Glu Asp Leu Thr Met Cys Leu Met Ser Lys Phe Thr Ala Gin Arg Leu 
725 730 735 

Arg Pro Phe Tyr Arg Phe Leu Ser Pro Ala Phe Gin Glu Phe Leu Ala 
740 745 750 

Gly Met Arg Leu He Glu Leu Leu Asp Ser Asp Arg Gin Glu Hie Gin 
755 760 765 

Asp Leu Gly Leu Tyr His Leu Lys Gin Ho Asn Ser Pro Met Met Thr 

770 775 . 780 

Val Ser Ala Tyr Asn Asn Phe Leu Ann Tyr Val Ser Ser Leu Pro Ser 

785 790 795 800 

Thr Lys Ala Gly Pro LyB He Val Ser His Leu Leu His Leu Val Asp 
805 810 815 

Aon Lys Glu Ser Leu Glu Asn He Ser Glu Aan Asp Asp Tyr Leu Lys 

820 ' 825 830 

His Gin Pro Glu He Ser Leu Gin Met Gin Leu Leu Ara Glv Leu Tro 
835 840 845 

Gin He Cys Pro Gin Ala Tyr Phe Ser Met Val Ser Glu His Leu Leu 
850 855 860 

Val Leu Ala Leu Lys Thr Ala Tyr Gin Ser Asn Thr Val Ala Ala Cys 
865 870 875 880 

Ser Pro Phe Val Leu Gin Phe Leu Gin Gly Arg Thr Leu Thr Leu Gly 
885 890 895 

Ala Leu Asn Leu Gin Tyr Phe Phe Asp Hia Pro Glu Ser Leu Ser Leu 

900 905 910 

Leu Arg Ser He His Phe Ser He Arg Gly Asn Lys Thr Ser Pro Arg 
915 920 925 

Ala His Phe Ser Val Leu Glu Thr Cys Phe Asp Lys Ser Gin Val Pro 
930 935 940 

Thr He. Asp Gin Asp Tyr Ala Ser Ala. Phe Glu Pro Met Aan -Glu Trp 
945 950 955 960 

Glu Arg Asn Leu Ala Glu LyB Glu Asp Asn Val Lys Ser Tyr Met Asp 
965 970 975 

Met Gin Arg Arg Ala Ser Pro Asp Leu Ser Thr Gly Tyr Trp Lys Leu 
980 985 990 

Ser Pro Lys Gin Tyr Lys lie Pro Cys Leu Glu Val Asp Val Asn Asp 
995 1000 1005 
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He Asp Val Val Gly Gin Aap Met Leu Glu He Leu Met Thr Val Phe 
1010 1015 1020 

ser Ala Ser Gin Arg He Glu Leu His Leu Asn His Ser Arg Gly Phe 
1025 1030 1035 1040 

He Glu Ser He Arg Fro Ala Leu Glu Leu Ser Lys Ala Ser Val Thr 
1045 1050 1055 

Lys Cys Ser He Ser Lys Leu Glu Leu Ser Ala Ala Glu Gin Glu Leu 
1060 1065 1070 

Leu Leu Thr Leu Pro Ser Leu Glu Ser Leu Glu Val Ser Gly Thr He 
1075 1080 1085 

Gin Ser Gin Asp Gin He Phe Pro Asn Leu Asp Lys Phe Leu Cys Leu 
1090 1095 1100 

Lys Glu Leu Ser Val Asp Leu Glu Gly Asn He Asn Val Phe Ser Val 
1105 1110 1115 1120 

He Pro Glu Glu Phe Pro Asn Phe His HIb Met Glu Lys Leu Leu He 
1125 1130 , 1135 

Gin He Ser Ala Glu Tyr Asp Pro Ser Lys Leu Val Ala Ser Leu Pro 
1140 1145 1150 

Asn Phe He Ser Leu Lys He Leu Asn Leu Glu Gly Gin Gin Phe Pro 
1155 1160 1165 

Asp Glu Glu Thr Ser Glu Lys Phe Ala Tyr He Leu Gly Ser Leu Ser 
1170 1175 1180 

Asn Leu Glu Glu Leu He Leu Pro Thr Gly Asp Gly He Tyr Arg Val 
1185 1190 1195 1200 

Ala Lys Leu He He Gin Gin Cys Gin Gin Leu His Cys Leu Arg Val 
1205 1210 1215 

Leu Ser Phe Phe Lys Thr Leu' Asn Asp Asp Ser Val Val Glu He Gly 
1220 1225 1230 



RECTIFIED SHEET (RULE 91) 
ISA/EP 
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(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
ATGCTTGGAT CTCTAGAATG G 21 
(2) INFORMATION FOR SEQ ID NO: 4; 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single , 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
AGCAAAGACA TGTGGCGGAA 20 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
CCAGCTCCTA GAGAAAGAAG GA 22 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
GAACTACGGC TGGACTCTTT T 21 
(2) INFORMATION FOR SEQ ID NO: 7: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
CTCTCAGCCT GCTCTTCAGA T 21 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

( D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
AAAGCCTCTG ACGAGAGGAT C 21 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGACTGCCTG TTCATCTACG A 21 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
TTTGTTCTCC AGCCACATAC T 21 
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(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CATTTGGCAT GTTCCTTCCA AG 22 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 27 base pair a 

( B ) TYPE: nucleic acid 

(C) STRANDEDNESS: eingle , 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
GTAGATGAAT ACTGATGTTT CATAATT 27 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: fl ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
TGCCACTGCC AGGCAATCTA A 21 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base paira 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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TAAACAGGAC ACGGTACAGT G 21 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairB 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
CATGTTTTAA GTCTCGGTGC TCTG 24 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
TTAGCCAGAT GTGTTGGCAC ATG 23 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 17: 
GATTCTATGT GATAGGCAGC CA 22 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 18: 
GCCACTGCTC CCGATGGATT A 21 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 baae pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GCTCTCAGCT GCTCATTCAG AT 22 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
ACAAAGTTCA CCACGGCTCT G 21 
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THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE 
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS: 

1. A human gene which maps to the SMA containing region 
5 of chromosome 5ql3, said gene comprising exons 1 through 

17 of approximately 5.5 kb and having a restriction map 
for exons 2 through 11 as shown in Figure 8. 

2. A human gene which maps to the SMA containing region 
10 of chromosome 5ql3 , said gene comprising exons 1 through 

17 of approximately 5.5 kb and having a restriction map 
for exons 2 through 16 as shown in Figure 9D. 

3. A human gene of claim 1 or 2 wherein exons 5 through 
15 16 code for the NAIP protein having an amino acid 

sequence biologically functionally equivalent to the 
amino acid sequence of Sequence ID No. 2. 

4. A human gene of claim 1 or 2 wherein exons 5 through 
20 16 have a cDNA sequence biologically functionally 

equivalent to the cDNA sequence of Sequence ID No. 1. 

5. A purified nucleotide sequence comprising genomic 
DNA, cDNA, mRNA, anti-sense DNA or homologous DNA 

25 corresponding to the cDNA sequence of Sequence ID No. 1. 

6. A DNA molecule comprising a DNA sequence of Sequence 
ID No. 1. 

30 7. A DNA molecule comprising a DNA sequence coding for 
the NAIP protein having Sequence ID No. 2. 

8. A purified DNA sequence consisting essentially of 
DNA Sequence ID No. l. 

35 

9. A purified DNA sequence consisting essentially of a 
DNA sequence coding for amino acid Sequence ID No, 2. 
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10. A purified DNA sequence comprising at least 18 
sequential bases of Sequence ID No. 1. 

11. A DNA probe comprising a DNA sequence of claim 10. 

5 

12. A PGR primer comprising a DNA sequence of claim 10. 

13. A DNA hybridization molecule comprising a DNA 
sequence of claim 10. 

10 

14. A purified DNA sequence of claims 10, 11 , 12 or 13 
wherein said DNA sequence is selected from exons 1 
through 4 and 17 of Table 4 . 

15 15. A purified DNA sequence of claims 10, 11, 12 or 13 
wherein said DNA sequence is selected from exons 5 
through 16 of Table 4. 

16. A purified DNA sequence of claim 10, 11, 12 or 13 
20 selected from the group of DNA sequences consisting of 

exon sequences 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
14, 15, 16 and 17 of Table 4. 

17. A purified DNA sequence of claim 16 wherein the DNA 
25 sequence is selected from exons 4, 5, 6 and 7. 

18. A purified DNA sequence of claim 16 wherein the DNA 
sequence is selected for exons 5 and 6. 

30 19. Use of DNA sequences of claims 1, 2, 3, 4, 5, 6, 7, 
— 8, 9, or 10 in the construction of a cloning vector or an 
expression vector. 

20. NAIP protein encoded by a DNA sequence of claims 1, 
35 2, 3, 4, 5, 6, 7, 8, or 9. 
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21. A 15 amino acid fragment of NAIP protein encoded by 
45 sequential bases of the DNA sequence of claim 10. 

22. NAIP protein comprising an amino acid sequence 

5 biologically equivalent to the amino acid sequence of 
Sequence ID No. 2. 

23. NAIP protein consisting essentially of the amino 
acid sequence of Sequence ID No. 2. 

10 

24 . NAIP protein fragment comprising at least 15 
sequential amino acids of Sequence ID No. 2. 

25. NAIP protein fragment comprising an amino acid 

15 sequence selected from the group of polypeptides encoded 
by one of exons 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 and 
16. 

26. NAIP protein fragment of claim 2 5 wherein selected 
2 0 polypeptides are those encoded by exons 5, 6,7, S, 9, 

10, 11 or 12. 

27. NAIP protein fragment of claim 25 wherein selected 
polypeptides are those encoded by exons 5 and 6. 

25 

28. Use of amino acid sequences of claims 20, 21 , 22, 
23, 24, 25 or 26 in production of hybridomas. 

29. A method for analyzing a biological sample to 
30 diagnose presence or absence of a gene encoding NAIP 

protein, said method comprising: 

i) providing a biological sample derived from the 
SMA containing region ql3 of human chromosome 5; 

ii) conducting a biological assay to determine 
35 presence or absence in said biological sample of at least 
a member selected from the group consisting of: 
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a) 
b) 



NAIP DNA Sequence ID No. 1, and 
NAIP protein Sequence ID No. 2. 



30. A method of claim 29 for diagnosing a human's risk 
5 of developing SMA wherein step ii) mutations in said 

sequence of group a) or b) are assayed, 

31. A method of claim 3 0 wherein presence or absence of 
exons 5 and 6 of group a) are assayed. 

10 

32. A method of claim 31 further comprising determining 
intact gene copy number in chromosome 5 of a gene 
encoding said NAIP protein. 

15 33. A method of claim 29, 30, 31 or 32 wherein said 
biological assay comprises an assay selected from the 
group consisting of DNA hybridization, restriction enzyme 
analysis, PCR amplification, mRNA detection and DNA 
sequencing. 

20 

34. A method of claim 29, 30, 31 or 32 wherein said 
biological assay comprises PCR amplification of exon 
regions 5 and 6 by use of PCR primers selected from the 
5' region of exon 5 and 3' region of exon 6. 
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